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DESCRIPTION 



HA4, A NEW OSTEOBLAST- AND CHONDROCYTE-SPECIFIC SMALL SECRETED 



This application claims the priority of U.S. Provisional Patent Application No. 
60/423,690, filed November 4, 2002, the entire disclosure of which is specifically incorporated 
herein by reference. The government owns rights in the present invention pursuant to grant 
number P01AR42919 from the National Institutes of Health. 

1 . Field of the In vention 

The present invention relates generally to molecular mechanisms of bone formation. 
More specifically, the invention relates to to an osteoblast- and chondrocyte-specific small 
secreted peptide designated as HA4 that is involved in bone formation. 

2. Description of Related Art 

Bone formation is a carefully controlled developmental process involving morphogen- 
mediated patterning signals that define areas of initial mesenchyme condensation, followed by 
induction of cell-specific differentiation programs to produce chondrocytes and osteoblasts. 
Positional information is conveyed via gradients of molecules, such as Sonic Hedgehog, that are 
released from cells within a particular morphogenic field together with region-specific patterns 
of hox gene expression. These molecules in turn regulate the localized production of bone 
morphogenetic proteins and related molecules which initiate chondrocyte- and osteoblast- 
specific differentiation programs. 

Differentiation requires the initial commitment of mesenchymal stem cells to a given 
lineage, followed by induction of tissue-specific patterns of gene expression. Considerable 
information about the control of osteoblast-specific gene expression has come from analysis of 
the promoter regions of genes encoding proteins like osteocalcin that are selectively expressed in 
bone. Both general and tissue- specific transcription factors control this promoter. Osf2/Cbfal, 
the first osteoblast specific transcription factor to be identified, is expressed early in the 
osteoblast lineage and interacts with specific DNA sequences in the osteocalcin promoter 
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essential for its selective expression in osteoblasts (Franceschi, 1999). Cbfal is needed for 
osteoblast differentiation. 

The reduced bone mineral density (BMD) observed in osteoporosis results, in part, from 
reduced activity of bone-forming osteoblasts (Jackson, 2000). The identification of factors that 
5 participate in the cell differentiation process has been beneficial in developing treatment 
protocols for osteoporosis. However, it is likely that other factors participate in the 
differentiation process as well. Thus, it would be beneficial to identify these factors both for ^ 
their use in diagnosis of bone degenerative disease and in their treatment. 



10 SUMMARY OF THE INVENTION 

The present invention is drawn to HA4 polypeptides, as well as DNA segments encoding 
HA4 polypeptides. The present invention also provides methods of making such DNA segments 
and polypeptides, as well as their use in drug screening, diagnosis and therapy of bone disease. 
Antibodies and transgenic animals and cells relating to HA4 are disclosed as well. 

15 Thus, in a particular aspect of the invention, there is provided a purified or a substantially 

purified HA4 protein or polypeptide. Generally, "purified" will refer to a protein or peptide 
composition that has been subjected to fractionation to remove various other components, and 
which composition substantially retains its expressed biological activity. Where the term 
"substantially purified" is used, this designation will refer to a composition in which the protein 

20 or peptide forms the major component of the composition, such as constituting about 50%, about 
60%, about 70%, about 80%, about 90%, about 95% or more of the proteins in the composition. 
In certain embodiments, the protein or polypeptide of the invention may be operatively linked to 
a second polypeptide sequence. It is also contemplated that purified or substantially purified 
peptides and polypeptides of between about 5 to 244 amino acids in length comprising a 

25 contiguous sequence from SEQ ID NO:2 are encompassed by the invention. Thus, for example 
the invention contemplates polypeptides or proteins of from about 5, 10, 15, 20, 25, 30, 35, 40, 
45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 
210, 220, 230, 240, 244 contiguous amino acids of SEQ ID NO:2 

In another embodiment, an isolated nucleic acid segment encoding a polypeptide 

30 comprising the sequence as shown in SEQ ID NO:2 is provided. The nucleic acid segment may 
comprise the DNA sequence as shown in SEQ ID NO:l. The nucleic acid segment may further 
comprise a promoter operably linked to the region encoding the protein. The promoter may be 
an inducible promoter, a constitutive promoter or a tissue specific promoter. The tissue specific 
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promoter may be a bone specific promoter. The nucleic acid segment may be comprised within 
a viral vector, such as an adenoviral vector, a retroviral vector, an adeno-associated viral vector, 
a vaccinia viral vector, a herpesviral vector or a pox viral vector. The nucleic acid segment may 
be comprised within a non-viral vector. The non-viral vector may be comprised in a lipid 
carrier. The nucleic acid segment may further comprise a region encoding a selectable marker 
protein. 

Examples of constitutive viral promoters include the HSV, TK, RSV, LTR promoter 
sequence from retroviral vectors, SV40 and CMV promoters, of which the CMV promoter is a 
currently preferred example. Examples of constitutive mammalian promoters include various 
housekeeping gene promoters, as exemplified by the p actin promoter. Other promoters may be 
dectin-1, dectin-2, human CD1 lc, F4/80, SM22, RSV, SV40, Ad MLP, p-actin, MHC class I or 
MHC class II promoter, 

Inducible promoters and/or regulatory elements are also contemplated for use with the 
expression vectors of the invention. Examples of suitable inducible promoters include promoters 
from genes such as cytochrome P450 genes, heat shock protein genes, metallothionein genes, 
hormone-inducible genes, such as the estrogen gene promoter, and such like. Promoters that are 
activated in response to exposure to ionizing radiation, such as fas, jun and egr-J, are also 
contemplated. 

Tissue-specific promoters and/or regulatory elements will be particularly useful in certain 
embodiments. Osteoblast-specific promoters that will be used are the 2.3 kB promoter of the 
mouse gene for pro-al(I)collagen and the 1.1 kB mouse osteocalcin promoter. 

The nucleic acid segment also may be characterized as (a) a nucleic acid segment 
comprising a sequence region that consists of 14 nucleotides that have the same sequence as, or 
complementary to, at least 14 contiguous nucleotides of SEQ ID NO: 1; or (b) a nucleic acid 
segment of from 14 to 10,000 nucleotides in length that hybridizes to the nucleic acid segment of 
SEQ ID NO:l, or the complement thereof, under stringent hybridization conditions. The 
segment may comprise a sequence region of at least 14, 17, 20, 25 or 30 contiguous nucleotides 
from SEQ ID NO:l or the complement thereof. The segment may be 17, 20, 25 or 30 
nucleotides in length. 

Nucleic acids of the invention may also be operatively linked to other protein-encoding 
nucleic acid sequences. This will generally result in the production of a fusion protein following 
expression of such a nucleic acid construct. Both N-terminal and C-terminal fusion proteins are 
contemplated. Virtually any protein- or polypeptide-encoding DNA sequence, or combinations 
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thereof, may be fused to an HA4 sequence in order to encode a fusion protein. This includes 
DNA sequences that encode targeting polypeptides, therapeutic proteins, proteins for 
recombinant expression, proteins to which one or more targeting polypeptides is attached, 
protein subunits and the like. 
5 The invention further includes DNA segments comprising the 5' untranslated regions (5' 

UTR) and 3' UTR of HA4 genomic DNA, and 5'-flanking regions and 3'-flanking regions of 
HA4, including those that regulate HA4 expression. The inventors contemplate experiments 
wherein an isolated promoter fragment of the HA4 gene will be used to drive transcription of a 
reporter gene such as the luciferase gene in recombinant cells or transgenic mice. Thus, in one 

10 aspect of the invention, a DNA segment comprising the 5'-flanking regions of HA4 operatively 
linked to a heterologous gene or a DNA segment that encodes a selected protein (e.g., a 
screenable marker) are contemplated. Osteoblast promoters may be used to obtain targeted 
expression of a gene in osteoblasts. 

Vectors and plasmids may be constructed with at least one multiple cloning site. In 

15 certain embodiments, the expression vector will comprise a multiple cloning site that is 
operatively positioned between a promoter and an HA4 gene sequence. Such vectors may be 
used, in addition to their uses in other embodiments, to create N-terminal fusion proteins by 
cloning a second protein-encoding DNA segment into the multiple cloning site so that it is 
contiguous and in-frame with the HA4 sequence. 

20 In other embodiments, expression vectors may comprise a multiple cloning site that is 

operatively positioned downstream from the expressible HA4 gene sequence. These vectors are 
useful, in addition to other uses, in creating C-terminal fusion proteins by cloning a second 
protein-encoding DNA segment into the multiple cloning site so that it is contiguous and in- 
frame with the HA4 sequence. Vectors and plasmids in which a second protein- or RNA- 

25 encoding nucleic acid segment is also present are, of course, also encompassed by the invention, 
irrespective of the nature of the nucleic acid segment itself. Expression vectors may also contain 
other nucleic acid sequences, such as IRES elements, polyadenylation signals, splice 
donor/splice acceptor signals, and the like. 

Particular examples of suitable expression vectors are those adapted for expression using 

30 a recombinant adenoviral, recombinant adeno-associated viral (AAV) or recombinant retroviral 
system. Vaccinia virus, herpes simplex virus, cytomegalovirus, and defective hepatitis B 
viruses, amongst others, may also be used. 
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Recombinant host cells form another aspect of the present invention. Such host cells will 
generally comprise at least one copy of an isolated HA4 gene linked to a heterologous promoter. 
Preferred cells for expression purposes will be prokaryotic host cells or eukaryotic host cells. 
Accordingly, cells such as bacterial, yeast, fungal, insect, nematode and plant cells are also 
5 possible. An example of a preferred bacterial host cell is E. coli. Examples of suitable 
eukaryotic host cells include VERO cells, HeLa cells, cells of Chinese hamster ovary (CHO) cell 
lines, COS cells, such as COS-7, and W138, BHK, HepG2, 3T3, RENT, MDCK, A549, PC12, 
K562 and 293 cells. Cells also include transgenic cells derived from transgenic animals 
engineered to overexpress or not express HA4, or to express a screenable marker under the 

10 control of HA4 regulatory signals. The marker may be luciferase, green fluorescent protein or 
any other gene whose expression is readily detected. 

Many methods of using HA4 genes are obtained from the present invention, such as 
expressing an HA4 protein in a cell. More specific methods obtained from the invention are 
methods for identifying a modulatory agent that inhibits, stimulates, or modulates the expression 

15 of HA4. Thus, provided is a method for identifying a modulator of HA4 transcription, the 
method comprising admixing (i) a cell expressing HA4 or a cell with a reporter gene operably 
linked to an HA4 promoter, and (ii) a candidate substance. A candidate substance that alters the 
transcription of the HA4 gene or reporter gene is a modulator. 

The invention also provides methods for identifying a bone cell stimulatory agent, 

20 comprising the steps of (a) admixing a composition comprising a population of precursor cells 
capable of expressing HA4; (b) incubating the admixture with a candidate substance; (c) testing 
the admixture for precursor cell differentiation; and (d) identifying the candidate substance that 
stimulates the differentiation of precursor cells into osteoblasts. In some embodiments, the 
precursor cell may be a mesenchymal precursor cell. The assay may be modified such that the 

25 precursor cells are stimulated to differentiate into osteoblasts, and the candidate substance is 
monitored for its ability to inhibit this process. 

Agents that modulate HA4 expression and/or activity may be used to treat a number of 
bone-related diseases, such as osteoporosis, glucocorticoid induced osteoporosis, Paget's disease, 
abnormally increased bone turnover, periodontal disease, tooth loss, bone fractures, rheumatoid 

30 arthritis, periprosthetic osteolysis, osteogenesis imperfecta, metastatic bone disease, 
hypercalcemia of malignancy and the like. The inventors contemplate that HA4 proteins and 
HA4 expression constructs will increase HA4 expression and activity, whereas HA4 antisense 
constructs, ribozymes and single-chain antibodies will inhibit HA4 expression and/or activity. 
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Additionally, the present invention provides for a non-human transgenic animal, cells of 
which comprise one allele of the HA4 gene that does not express a functional HA4 product. The 
non-human transgenic animal may be a mouse. The non-human transgenic animal may 
alternatively have cells which comprise an expression cassette comprising an HA4 5 '-regulatory 
5 region operably linked to a screenable marker gene. The screenable marker gene is luciferase, 
green fluorescent protein, and p-galactosidase. 

Following longstanding patent law convention, the word "a 55 and "an," when used in 
conjunction with the word comprising, mean "one or more" in this specification, including the 
claims. Other objects, features and advantages of the present invention will become apparent 
10 from the following detailed description. It should be understood, however, that the detailed 
description and the specific examples, while indicating preferred embodiments of the invention, 
are given by way of illustration only, since various changes and modifications within the spirit 
and scope of the invention will become apparent to those skilled in the art from this detailed 
description. 



The following drawings form part of the present specification and are included to further 
demonstrate certain aspects of the present invention. The invention may be better understood by 
reference to one or more of these drawings in combination with the detailed description of 
specific embodiments presented herein. 



15 



BRIEF DESCRIPTION OF THE DRAWINGS 



20 



FIG. 1 



Genomic Structure of HA4 Genes. 



FIG. 2 



Northern Blot Analysis of HA4 Expression in Adult Mouse Tissues. 
Northern Blot Analysis of HA4 Expression During Mouse Embryogenesis. 
In situ Hybridization of HA4. 



FIG. 3 



FIG. 4 



25 



FIG. 5 - X-gal Staining of HA4 Heterozygous Embryos. 
FIG. 6 - X-gal Staining of HA4 a Heterozygous Embryo. 

FIG. 7 - HA4 deficient mice have reduced bone density and provide a mouse model for 
human osteoporosis. 

FIG. 8 - Generation of transgenic mice and detection of HA4 protein in serum. 
FIG. 9 - Production of recombinant HA4 protein. 
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DETAILED DESCRIPTION OF THE ILLUSTRATIVE EMBODIMENTS 

Bone formation is a complex process that involves the differentiation of mesenchymal 
cell precursors into osteoblasts. It is believed that defects in this process can lead to various 
5 diseases, including those classified as degenerative bone diseases. The most important of these 
diseases is osteoporosis, literally meaning a disease of too little bone, which results in fragility 
fractures that occur with very little trauma. It is becoming progressively more common, partly 
because it is a disease that increases in frequency in patients who are over 60 years of age, a 
segment of the population that is progressively increasing. Osteoporosis is much more common 
10 in elderly females than in elderly males because the estrogen deficiency that occurs at the time of 
menopause leads to increased bone destruction, which is not compensated by a corresponding 
increase in bone formation (i.e., a negative bone balance), resulting in bone loss and, eventually, 
osteoporosis in many females. 

The inventors identified a cDNA encoding a small secreted polypeptide containing a 
15 collagen triple helix repeat, designated as HA4, using a suppression subtraction between BMP- 
untreated and BMP-treated chondrogenic ATDC5 cells. In newborn homozygous HA4 mutants, 
reduced bone density was observed, and the number of bone trabecules was markedly reduced. 
This phenotype mimics that observed in humans with osteoporosis. The inventors thus have 
demonstrated a role for HA4 in bone and cartilage metabolism. 

20 

I. HA4 Polypeptides 

A. Polyepeptides and Peptides 

As used herein below, the term HA4 should be interpreted to include not only the HA4 
polypeptide of 244 amino acids, but also glycosylated forms as well as non-glycosylated forms 

25 of the molecule, and other members of the HA4 family. The present invention also encompasses 
peptides of about 3 to about 50 amino acids, and polypeptides of greater than 50 amino acids. 
All the "proteinaceous" terms described above may be used interchangeably herein. 

In certain embodiments the size of the at least one proteinaceous molecule may comprise, 
but is not limited to, about 1, about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 

30 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 
19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, 
about 29, about 30, about 31, about 32, about 33, about 34, about 35, about 36, about 37, about 
38, about 39, about 40, about 41, about 42, about 43, about 44, about 45, about 46, about 47, 
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about 48, about 49, about 50, about 51, about 52, about 53, about 54, about 55, about 56, about 
57, about 58, about 59, about 60, about 61, about 62, about 63, about 64, about 65, about 66, 
about 67, about 68, about 69, about 70, about 71, about 72, about 73, about 74, about 75, about 
76, about 77, about 78, about 79, about 80, about 81, about 82, about S3, about 84, about 85, 
5 about 86, about 87, about 88, about 89, about 90, about 91, about 92, about 93, about 94, about 
95, about 96, about 97, about 98, about 99, about 100, about 110, about 120, about 130, about 
140, about 150, about 160, about 170, about ISO, about 190, about 200, about 210, about 220, 
about 230, about 240, to about 244 reisdues. Fusion of greater size also are contemplated. 

As used herein, an "amino acid molecule" refers to any amino acid, amino acid derivative 

10 or amino acid mimic as would be known to one of ordinary skill in the art. In certain 
embodiments, the residues of the proteinaceous molecule are sequential, without any non-amino 
molecule interrupting the sequence of amino molecule residues. In other embodiments, the 
sequence may comprise one or more non-amino molecule moieties. In particular embodiments, 
the sequence of residues of the proteinaceous molecule may be interrupted by one or more non- 

15 amino molecule moieties. Accordingly, the term "proteinaceous composition" encompasses 
amino molecule sequences comprising at least one of the 20 common amino acids in naturally 
synthesized proteins, or at least one modified or unusual amino acid, including but not limited to 
those shown on Table 1 below. 



TABLE 1 
Modified and Unusual 


I 

Amino Acids 


Abbr. 


Amino Acid 


Abbr. 


Amino Acid 


Aad 


2-Aminoadipic acid 


EtAsn 


N-Ethylasparagine 


Baad 


3- Aminoadipic acid 


Hyl 


Hydroxylysine 


Bala 


(3-alanine, P-Amino-propionic acid 


AHyl 


allo-Hydroxylysine 


Abu 


2-Aminobutyric acid 


3Hyp 


3-Hydroxyproline 


4Abu 


4- Aminobutyric acid, piperidinic acid 


4Hyp 


4-Hydroxyproline 


Acp 


6-Aminocaproic acid 


Ide 


Isodesmosine 


Ahe 


2-Aminoheptanoic acid 


Alle 


allo-Isoleucine 


Aib 


2-Aminoisobutyric acid 


MeGly 


N-Methylglycine, 
sarcosine 


Baib 


3-Aminoisobutyric acid 


Melle 


N-Methylisoleucine 


Apm 


2-Aminopimehc acid 


MeLys 


6-N-Methyllysine 
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TABLE 1 
Modified and Unusual Amino Acids 


Abbr. 


Amino Acid 


Abbr. 


Amino Acid 


Dbu 


2,4-Diaminobutyric acid 


MeVal 


N-Methylvaline 


Des 


Desmosine 


Nva 


Norvaline 


Dpm 


2,2 f -Diaminopimelic acid 


Nle 


Norleucine 


Dpr 


2,3-Diaminopropionic acid 


Orn 


Ornithine 


EtGly 


N-Ethylglycine 







In further embodiments, the proteinaceous composition comprises a biocompatible 
protein, polypeptide or peptide. As used herein, the term "biocompatible" refers to a substance 
which produces no significant untoward effects when applied to, or administered to, a given 
5 organism according to the methods and amounts described herein. Such untoward or undesirable 
effects are those such as significant toxicity or adverse immunological reactions. In preferred 
embodiments, biocompatible protein, polypeptide or peptide containing compositions will 
generally be mammalian proteins or peptides or synthetic proteins or peptides each essentially 
free from toxins, pathogens and harmful immunogens. 

10 

B. Purification of HA4 Proteins 

Further aspects of the present invention concern the purification of an HA4 protein or 
polypeptide. The term "purified protein" as used herein, is intended to refer to an HA4 
composition isolatable from natural sources such as osteoblastic MC3T3-E1 cells and 

15 undifferentiated ATDC5 cells, or recombinant host cells, wherein the HA4 is purified to any 
degree relative to its naturally-obtainable state. It is contemplated that the purified HA4 proteins 
or polypeptides of the invention will generally possess HA4 activity. That is, they will have the 
capacity to promote osteoblast differentiation and/or bone formation. 

HA4 may be purified from extracts of various cells by immunoprecipitation using 

20 polyclonal anti-HA4 antibodies or monoclonal antibodies (MAb) (see below). In one 
embodiment, a cDNA encoding HA4 is expressed in a host cell, such as bacteria, yeast cells, 
insect cells, or mammalian cells, and the expressed proteins purified using antibodies against 
HA4. 

Various techniques suitable for use in protein purification will be well known to those of 
25 skill in the art. These include, for example, precipitation with ammonium sulfate, PEG, 
antibodies and the like or by heat denaturation, followed by centrifugation; chromatography 

-9- 
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steps such as ion exchange, gel filtration, reverse phase, hydroxylapatite, lectin affinity and other 
affinity chromatography steps; isoelectric focusing; gel electrophoresis; and combinations of 
such and other techniques. A specific example is the purification of HA4 using 
immunoprecipitation with anti-HA4 antibodies. 
5 Where the temi "substantially purified" is used, this will refer to a composition in which 

HA4 forms the major component of the composition, such as constituting about 50% of the 
proteins in the composition or more. In preferred embodiments, a substantially purified protein 
will constitute more than 60%, 70%, 80%, 90%, 95% or 99% of the proteins in the composition. 

A polypeptide or protein that is "purified to homogeneity," as applied to the present 

10 invention, means that the polypeptide or protein has a level of purity where the polypeptide or 
protein is substantially free from other proteins and biological components. For example, a 
purified polypeptide or protein will often be sufficiently free of other protein components so that 
degradative sequencing may be performed successfully. 

Various methods for quantifying the degree of purification of the HA4 protein will be 

15 known to those of skill in the art in light of the present disclosure. These include, for example, 
determining the specific activity of an active fraction, or assessing the number of polypeptides 
within a fraction by gel electrophoresis. Assessing the number of polypeptides within a fraction 
by SDS/PAGE analysis will often be preferred in the context of the present invention, e.g., in 
assessing protein purity. 

20 As mentioned above, although preferred for use in certain embodiments, there is no 

general requirement that the HA4 proteins or polypeptides always be provided in their most 
purified state. Indeed, it is contemplated that less substantially purified proteins or polypeptides, 
which are nonetheless enriched in HA4 activity relative to the natural state, will have utility in 
certain embodiments. For example, less purified HA4 preparations may contain molecules that 

25 are associated naturally with HA4. If so, this may, ultimately, lead to the identification of unique 
molecules that associate with HA4 on the cell surfaces (e.g., co-receptors) or in the cytoplasma 
(e.g., signaling components). 

Methods exhibiting a lower degree of relative purification may have advantages in total 
recovery of protein product, or in maintaining the activity of an expressed protein. Inactive 

30 products also have utility in certain embodiments, such as, e.g., in antibody generation. 

Partially purified HA4 fractions for use in such embodiments may be obtained by 
subjecting cells or cell extracts to one or a combination of the steps described. Substituting 
certain steps with improved equivalents is also contemplated to be useful. For example, it is 

-10- 
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appreciated that a cation-exchange column chromatography performed utilizing an HPLC 
apparatus will generally result in a greater "-fold" purification than the same technique utilizing 
a low pressure chromatography system. 

C. Biologically Functional Equivalents and Structural Equivalents 

Modifications may be made in the structure of HA4 and still obtain a molecule having 
like or otherwise desirable characteristics. For example, certain amino acids may be substituted 
for other amino acids in a protein structure without appreciable loss of interactive binding 
capacity with structures such as, for example, antigen-binding regions of antibodies or binding 
sites on substrate molecules, receptors, or osteoblasts. Since it is the interactive capacity and 
nature of a protein that defines that protein's biological functional activity, certain amino acid 
sequence substitutions can be made in a protein sequence (or, of course, its underlying DNA 
coding sequence) and nevertheless obtain a protein with like (agonistic) properties. Equally, the 
same considerations may be employed to create a protein or polypeptide with countervailing 
(e.g., antagonistic) properties. It is thus contemplated by the inventors that various changes may 
be made in the sequence of HA4 protein or polypeptide (or underlying DNA) without 
appreciable loss of their biological utility or activity. 

In terms of functional equivalents, it is also well understood by the skilled artisan that, 
inherent in the definition of a biologically functional equivalent protein or polypeptide, is the 
concept that there is a limit to the number of changes that may be made within a defined portion 
of the molecule and still result in a molecule with an acceptable level of equivalent biological 
activity. Biologically functional equivalent polypeptides are thus defined herein as those 
polypeptides in which certain, not most or all, of the amino acids may be substituted. Of course, 
a plurality of distinct proteins/polypeptides with different substitutions may be made and used in 
accordance with the invention. 

It is also well understood that where certain residues are shown to be particularly 
important to the biological or structural properties of a protein or polypeptide, e.g., residues in 
active sites, such residues may not generally be exchanged. Amino acid substitutions are 
generally based on the relative similarity of the amino acid side-chain substituents, for example, 
their hydrophobicity, hydrophilicity, charge, size, and the like. An analysis of the size, shape 
and type of the amino acid side-chain substituents reveals that arginine, lysine and histidine are 
all positively charged residues; that alanine, glycine and serine; and phenylalanine, tryptophan 
and tyrosine; are defined herein as biologically functional equivalents. 
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Conservative substitutions well known in the art include, for example, the changes of 
alanine to serine; arginine to lysine; asparagine to glutamine or histidine; aspartate to glutamate; 
cysteine to serine; glutamine to asparagine; glutamate to aspartate; glycogen to proline; histidine 
to asparagine or glutamine; isoleucine to leucine or valine; leucine to valine or isoleucine; lysine 
5 to arginine, glutamine, or glutamate; methionine to leucine or isoleucine; phenylalanine to 
tyrosine, leucine or methionine; serine to threonine; threonine to serine; tryptophan to tyrosine; 
tyrosine to tryptophan or phenylalanine; and valine to isoleucine or leucine. 

In making such changes, the hydropathic index of amino acids may be considered. Each 
amino acid has been assigned a hydropathic index on the basis of their hydrophobicity and 
10 charge characteristics, these are: isoleucine (+4.5); valine (+4.2); leucine (+3.8); phenylalanine 
(+2.8); cysteine/cystine (+2.5); methionine (+1.9); alanine (+1.8); glycine (-0.4); threonine (- 
0.7); serine (-0.8); tryptophan (-0.9); tyrosine (-1.3); proline (-1.6); histidine (-3.2); glutamate (- 
3.5); glutamine (-3.5); aspartate (-3.5); asparagine (-3.5); lysine (-3.9); and arginine (-4.5). 

The importance of the hydropathic amino acid index in conferring interactive biological 
15 function on a protein is generally understood in the art (Kyte and Doolittle, 1982, incorporated 
herein by reference). It is know that certain amino acids may be substituted for other amino 
acids having a similar hydropathic index or score and still retain a similar biological activity. In 
making changes based upon the hydropathic index, the substitution of amino acids whose 
hydropathic indices are within ±2 is preferred, those which are within ±1 are particularly 
20 preferred, and those within ±0.5 are even more particularly preferred. 

It is also understood in the art that the substitution of like amino acids can be made 
effectively on the basis of hydrophilicity, particularly where the biological functional equivalent 
protein or polypeptide thereby created is intended for use in immunological embodiments, as in 
the present case. U.S. Patent 4,554,101, incorporated herein by reference, states that the greatest 
25 local average hydrophilicity of a protein, as governed by the hydrophilicity of its adjacent amino 
acids, correlates with its immunogenicity and antigenicity, i.e., with a biological property of the 
protein. 

As detailed in U.S. Patent 4,554,101, the following hydrophilicity values have been 
assigned to amino acid residues: arginine (+3.0); lysine (+3.0); aspartate (+3.0 + 1); glutamate 
30 (+3.0 + 1); serine (+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); threonine (-0.4); 
proline (-0.5 ± 1); alanine (-0.5); histidine (-0.5); cysteine (-1.0); methionine (-1.3); valine (-1.5); 
leucine (-1.8); isoleucine (-1.8); tyrosine (-2.3); phenylalanine (-2.5); tryptophan (-3.4). In 
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making changes based upon similar hydrophilicity values, the substitution of amino acids whose 
hydrophilicity values are within ±0.5 are even more particularly preferred. 

While discussion has focused on functionally equivalent polypeptides arising from amino 
acid changes, it will be appreciated that these changes may be effected by alteration of the 
5 encoding DNA; taking into consideration also that the genetic code is degenerate and that two or 
more codons may code for the same amino acid. A table of amino acids and their codons is 
presented herein for use in such embodiments, as well as for other uses, such as in the design of 
probes and primers and the like. 

Polypeptides corresponding to one or more antigenic determinants, or "epitopic core 

10 regions, 55 of HA4 can also be prepared. Such polypeptides should generally be at least five or six 
amino acid residues in length, and may contain up to about 35-50 residues or so. While peptides 
can be created by proteolytic cleavage, a more typical methods is to synthesize the peptides. 
Synthetic polypeptides will generally be about 35 residues long, which is the approximate upper 
length limit of automated polypeptide synthesis machines, such as those available from Applied 

15 Biosystems (Foster City, CA). Longer polypeptides may also be prepared, e.g., by recombinant 
means. 

U.S. Patent 4,554,101 (Hopp, incorporated herein by reference) teaches the identification 
and preparation of epitopes from primary amino acid sequences on the basis of hydrophilicity. 
Through the methods disclosed in Hopp, one of skill in the art would be able to identify epitopes 

20 from within an amino acid sequence. Numerous scientific publications have also been devoted 
to the prediction of secondary structure, and to the identification of epitopes, from analyses of 
amino acid sequences (Chou and Fasman, 1974a,b; 1978a,b; 1979). Any of these may be used, 
if desired, to supplement the teachings of Hopp in U.S. Patent 4,554,101. Moreover, computer 
programs are currently available to assist with predicting antigenic portions and epitopic core 

25 regions of proteins. Examples include those programs based upon the Jameson- Wolf analysis 
(Jameson and Wolf, 1988; Wolf et aL 9 1988), the program PepPlot® (Brutlag et al., 1990; 
Weinberger et al, 1985), and other new programs for protein tertiary structure prediction 
(Fetrow and Bryant, 1993). Further commercially available software capable of carrying out 
such analyses is termed MacVector (IBI, New Haven, CT). 

30 hi further embodiments, major antigenic determinants of a polypeptide may be identified 

by an empirical approach in which portions of the gene encoding the polypeptide are expressed 
in a recombinant host, and the resulting proteins tested for their ability to elicit an immune 
response. For example, PCR™ can be used to prepare a range of polypeptides lacking 
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successively longer fragments of the C-terminus of the protein. The immunoactivity of each of 
these polypeptides is determined to identify those fragments or domains of the polypeptide that 
are immunodominant. Further studies in which only a small number of amino acids are removed 
at each iteration then allows the location of the antigenic determinants of the polypeptide to be 
5 more precisely determined. 

Once one or more such analyses are completed, polypeptides are prepared that contain at 
least the essential features of one or more antigenic determinants. The polypeptides are then 
employed in the generation of antisera against the polypeptide. Minigenes or gene fusions 
encoding these determinants can also be constructed and inserted into expression vectors by 

1 0 standard methods, for example, using PCR™ cloning methodology. 

In addition to the peptidyl compounds described herein, the inventors also contemplate 
that other sterically similar compounds may be formulated to mimic the key portions of the 
polypeptide structure. Such compounds, which may be termed peptidomimetics, may be used in 
the same manner as the polypeptides of the invention and hence are also functional equivalents. 

15 Certain mimetics that mimic elements of protein secondary structure are described in 

Johnson et al (1993). The underlying rationale behind the use of polypeptide mimetics is that 
the polypeptide backbone of proteins exists chiefly to orientate amino acid side chains in such a 
way as to facilitate molecular interactions, such as those of antibody and antigen. A polypeptide 
mimetic is thus designed to permit molecular interactions similar to the natural molecule. 

20 Some successful applications of the polypeptide mimetic concept have focused on 

mimetics of p-turns within proteins, which are known to be highly antigenic. Likely (3-turn 
structure within a polypeptide can be predicted by computer-based algorithms, as discussed 
herein. Once the component amino acids of the turn are determined, mimetics can be 
constructed to achieve a similar spatial orientation of the essential elements of the amino acid 

25 side chains. 

The generation of further structural equivalents or mimetics may be achieved by the 
techniques of modeling and chemical design known to those of skill in the art. The art of 
receptor modeling is now well known, and by such methods a chemical that binds to the 
osteoblast HA4 receptor can be designed and then synthesized. It will be understood that all 
30 such sterically similar constructs fall within the scope of the present invention. 
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D. Production of Antibodies Against HA4 

Means for preparing and characterizing antibodies are well known in the art (see, e.g., 
Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory, 1988; incorporated herein 
by reference). The methods for generating monoclonal antibodies (MAbs) generally begin along 
the same lines as those for preparing polyclonal antibodies. Briefly, a polyclonal antibody is 
prepared by immunizing an animal with an immunogenic composition in accordance with the 
present invention (either with or without prior immunotolerizing, depending on the antigen 
composition and protocol being employed) and collecting antisera from that immunized animal. 

A wide range of animal species can be used for the production of antisera. Typically the 
animal used for production of anti-antisera is a rabbit, a mouse, a rat, a hamster, a guinea pig or a 
goat. Because of the relatively large blood volume of rabbits, a rabbit is a preferred choice for 
production of polyclonal antibodies. 

As is well known in the art, a given composition may vary in its immunogenicity. It is 
often necessary therefore to boost the host immune system, as may be achieved by coupling a 
peptide or polypeptide immunogen to a carrier. Exemplary and preferred carriers are keyhole 
limpet hemocyanin (KLH) and bovine serum albumin (BSA). Other albumins such as 
ovalbumin, mouse serum albumin or rabbit serum albumin can also be used as carriers. Means 
for conjugating a polypeptide to a earner protein are well known in the art and include 
glutaraldehyde, ^i-maleimidobencoyl-N-hydroxysuccinimide ester, carbodiimyde and bis- 
biazotized benzidine. 

As is also well known in the art, the immunogenicity of a particular immunogen 
composition can be enhanced by the use of non-specific stimulators of the immune response, 
known as adjuvants. Suitable adjuvants include all acceptable immunostimulatory compounds, 
such as cytokines, toxins or synthetic compositions. 

Adjuvants that may be used include EL-1, IL-2, IL-4, IL-7, IL-12, y-interferon, GMCSP, 
BCG, aluminum hydroxide, MDP compounds, such as thur-MDP and nor-MDP, CGP (MTP- 
PE), lipid A, and monophosphoryl lipid A (MPL). RIBI, which contains three components 
extracted from bacteria, MPL, trehalose dimycolate (TDM) and cell wall skeleton (CWS) in a 
2% squalene/Tween 80 emulsion. MHC antigens may even be used. 

Exemplary, often preferred adjuvants include complete Freund's adjuvant (a non-specific 
stimulator of the immune response containing killed Mycobacterium tuberculosis), incomplete 
Freund's adjuvants and aluminum hydroxide adjuvant. 
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The amount of immunogen composition used in the production of polyclonal antibodies 
varies upon the nature of the immunogen as well as the animal used for immunization. A variety 
of routes can be used to administer the immunogen (subcutaneous, intramuscular, intradermal, 
intravenous and intraperitoneal). The production of polyclonal antibodies may be monitored by 
5 sampling blood of the immunized animal at various points following immunization. 

MAbs may be readily prepared through use of well-known techniques, such as those 
exemplified in U.S. Patent 4,196,265, incorporated herein by reference. Typically, this 
technique involves immunizing a suitable animal with a selected immunogen composition, e.g., a 
purified or partially purified HA4 protein, polypeptide or peptide (or any osteoblast composition, 

10 if used after tolerization to common antigens). The immunizing composition is administered in a 
manner effective to stimulate antibody producing cells. 

The methods for generating MAbs generally begin along the same lines as those for 
preparing polyclonal antibodies. Rodents such as mice and rats are preferred animals, however, 
the use of rabbit, sheep or frog cells is also possible. The use of rats may provide certain 

15 advantages (Goding, 1986), but mice are preferred, with the BALB/c mouse being most 
preferred as this is most routinely used and generally gives a higher percentage of stable fusions. 
The inventors have generated the MAb against mouse HA4 in rats. This was primarily because 
it is technically difficult to immune mice with molecules of mouse origin. On the other hand, the 
inventors will prefer mice for the generation of MAb against human HA4. 

20 The animals are injected with antigen, generally as described above. The antigen may be 

coupled to carrier molecules such as keyhole limpet hemocyanin if necessary. The antigen 
would, typically be mixed with adjuvant, such as Freund's complete or incomplete adjuvant. 
Booster injections with the same antigen would occur at approximately two-week intervals. 

Following immunization, somatic cells with the potential for producing antibodies, 

25 specifically B lymphocytes (B cells), are selected for use in the MAb generating protocol. These 
cells may be obtained from biopsied spleens, tonsils or lymph nodes, or from a peripheral blood 
sample. Spleen cells and peripheral blood cells are preferred, the former because they are a rich 
source of antibody-producing cells that are in the dividing plasmablast stage, and the latter 
because peripheral blood is accessible. 

30 Often, a panel of animals will have been immunized and the spleen of animal with the 

highest antibody titer will be removed and the spleen lymphocytes obtained by homogenizing the 
spleen with a syringe. Typically, a spleen from an immunized mouse contains approximately 5 x 
10 7 to 2 x 10 s lymphocytes. 
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The antibody-producing B lymphocytes from the immunized animal are then fused with 
cells of an immortal myeloma cell, generally one of the same species as the animal that was 
immunized. Myeloma cell lines suited for use in hybridoma-producing fusion procedures 
preferably are non-antibody-producing, have high fusion efficiency, and enzyme deficiencies 
5 that render then incapable of growing in certain selective media which support the growth of 
only the desired fused cells (hybridomas). 

Any one of a number of myeloma cells may be used, as are known to those of skill in the 
art (Goding, pp. 65-66, 1986; Campbell, pp. 75-83, 1984). For example, where the immunized 
animal is a mouse, one may use P3-X63/Ag8, X63-Ag8.653 ? NSl/l.Ag 4 1, Sp210-Agl4, FO, 
10 NSO/U, MPC-11, MPC1 1-X45-GTG 1.7 and S194/5XX0 Bui; for rats, one may use 
R210.RCY3, Y3-Ag 1.2.3, IR983F and 4B210; and U-266, GM1500-GRG2, LICR-LON-HMy2 
and UC729-6 all of which are useful in connection with human cell fusions. One preferred 
murine myeloma cell is the NS-1 myeloma cell line (also termed P3-NS-l-Ag4-l), which is 
readily available from the NIGMS Human Genetic Mutant cell Repository by requesting cell line 
15 repository number GM3573. Another mouse myeloma cell line that may be used is the 
8-azaguanine-resistant mouse murine myeloma SP2/0 non-producer cell line. 

II. DNA and RNA Segments for HA4 
A. DNA Segments 

20 Important aspects of the present invention concern isolated DNA segments and 

recombinant vectors encoding HA4, and the creation and use of recombinant host cells that 
express HA4 through the application of DNA technology. More specifically, the present 
invention concerns mammalian DNA segments, isolated away from other mammalian genomic 
DNA segments or total chromosomes. Preferred sources for the HA4 DNA segments of the 

25 invention are human gene sequences. In cloning an HA4 sequence of the invention, one may 
advantageously choose an established osteoblast line. But other sources will be equally 
appropriate, such as cDNA or genomic libraries. The DNA segments of the invention are 
capable of conferring HA4-like activity or properties, such as defined herein below, to a 
recombinant host cell when incorporated into the recombinant host cell. 

30 As used herein, the term "DNA segment" refers to a DNA molecule that has been 

isolated substantially free of total genomic DNA and chromosomes of a particular species. 
Therefore, a DNA segment encoding HA4 refers to a DNA segment that contains HA4 coding 
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sequences yet is isolated away from, or purified free from, total genomic DNA of tissues known 
to contain relatively large numbers of osteoblasts, or of the BMP2-treated C2C12 line. 

A DNA segment comprising an isolated or purified HA4 gene also refers to a DNA 
segment including HA4 coding sequences and, in certain aspects, regulatory sequences, isolated 
5 substantially away from other naturally occurring genes or protein encoding sequences. In this 
respect, the term "gene" is used for simplicity to refer to a DNA segment that encodes a 
polypeptide or a functional protein. As will be understood by those in the art, this functional 
term includes both genomic sequences, cDNA sequences and smaller engineered gene segments 
that express, or may be adapted to express, proteins, polypeptides or peptides. 

10 "Isolated substantially away from other coding sequences" means that the gene of 

interest, in this case HA4, forms the significant part of the coding region of the DNA segment, 
and that the DNA segment does not contain large portions of naturally-occurring coding DNA, 
such as large chromosomal fragments or other functional genes or cDNA coding regions. Of 
course, this refers to the DNA segment as originally isolated, and does not exclude genes or 

15 coding regions later added to the segment by the hand of man. 

In particular embodiments, the invention concerns isolated DNA segments and 
recombinant vectors incorporating DNA sequences that encode an HA4 protein or polypeptide 
that includes within its amino acid sequence an amino acid sequence in accordance with SEQ ID 
NO:2, corresponding to human or mammalian HA4. 

20 In certain embodiments, the invention concerns isolated DNA segments and recombinant 

vectors that encode a protein or polypeptide that includes within its amino acid sequence an 
amino acid sequence essentially as set forth in SEQ ID NO:2. Naturally, where the DNA 
segment or vector encodes a full length HA4 protein, or is intended for use in expressing the 
HA4 protein, the most preferred sequences are those that are essentially as set forth in SEQ ID 

25 NO:2. 

The term "a sequence essentially as set forth in SEQ ID NO:2" means that the sequence 
substantially corresponds to a portion of SEQ ID NO:2 and has relatively few amino acids that 
are not identical to, or a biologically functional equivalent of, the amino acids of SEQ ED NO: 2. 
The term "biologically functional equivalent" is well understood in the art and is further defined 
30 in detail herein. Accordingly, sequences that have between about 70% and about 80%; or more 
preferably, between about 81% and about 90%; or even more preferably, between about 91% 
and about 99%; of amino acids that are identical or functionally equivalent to the amino acids of 
SEQ ID NO:2 will be sequences that are "essentially as set forth in SEQ ID NO:2." 
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In other embodiments, the invention concerns isolated DNA segments and recombinant 
vectors that include within their sequence a nucleic acid sequence essentially as set forth in SEQ 
ID NO:L The term "essentially as set forth in SEQ ID NO:l" is used in the same sense as 
( described above and means that the nucleic acid sequence substantially corresponds to a portion 
of SEQ ID NO:l and has relatively few codons that are not identical, or functionally equivalent, 
to the codons of SEQ ID NO:l. The term "functionally equivalent codon" is used herein to refer 
to codons that encode the same amino acid, such as the six codons for arginine or serine, and 
also refers to codons that encode biologically equivalent amino acids. Table 1 sets forth the 
amino acids and codons which encode each amino acid. 
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TABLE 2 



Amino Acids 


Codons 


Alanine 


Ala 


A 




GCA 


GCC 


GCG 


GCU 






Cysteine 


Cys 


C 




UGC 


UGU 










Aspartic acid 


Asp 


D 




GAC 


GAU 










Glutamic acid 


Glu 


E 




GAA 


GAG 










Phenylalanine 


Phe 


F 




UUC 


uuu 










Glycine 


Gly 


G 




GGA 


GGC 


GGG 


GGU 






Histidine 


His 


H 




CAC 


CAU 










Isoleucine 


He 


I 




AUA 


AUC 


AUU 








Lysine 


Lys 


K 




AAA 


AAG 










Leucine 


Leu 


L 




UUA 


UUG 


CUA 


cue 


CUG 


CUU 


Methionine 


Met 


M 




AUG 












Asparagine 


Asn 


N 




AAC 


AAU 










Proline 


Pro 


P 




CCA 


CCC 


CCG 


ecu 






Glutamine 


Gin 


Q 




CAA 


CAG 










Arginine 


Arg 


R 




AGA 


AGG 


CGA 


CGC 


CGG 


CGU 


Serine 


Ser 


S 




AGC 


AGU 


UCA 


UCC 


UCG 


UCU 


Threonine 


Thr 


T 




ACA 


ACC 


ACG 


ACU 






Valine 


Val 


V 




GUA 


GUC 


GUG 


GUU 






Tryptophan 


Tip 


w 




UGG 












Tyrosine 


Tyr 


Y 




UAC 


UAU 











It is within the scope of the invention in certain aspects that high level protein production 
5 may be achieved by reducing criteria for osteoblast differentiation. In certain embodiments it is 
within the invention to produce proteins lacking activity. Such proteins might be useful in very 
high volume to raise antibodies to the protein. 

It will also be understood that amino acid and nucleic acid sequences may include 
additional residues, such as additional N- or C-terminal amino acids or 5' or 3' sequences, and 
10 yet still be essentially as set forth in one of the sequences disclosed herein, so long as the 
sequence meets the criteria set forth above, including the maintenance of osteoblast 
differentiation activity where protein expression is concerned. The addition of terminal 
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sequences particularly applies to nucleic acid sequences that may, for example, include various 
non-coding sequences flanking either of the 5' or 3' portions of the coding region or may include 
various internal sequences, i.e., introns, which are known to occur within genes. 

Suitable high stringency hybridization conditions will be well known to those of skill in 
5 the art and are clearly set forth herein, for example conditions such as relatively low salt and/or 
high temperature conditions, such as provided by 0.02M-0.15M NaCl at temperatures of 50°C to 
70°C, for applications requiring high selectivity. Such relatively stringent conditions tolerate 
little, if any, mismatch between the probe and the template or target strand, and would be 
particularly suitable for isolating HA4 genes. 

10 Naturally, the present invention also encompasses DNA segments that are 

complementary, or essentially complementary, to the sequence set forth in SEQ ID NO:l. 
Nucleic acid sequences that are "complementary" are those that are capable of base-pairing 
according to the standard Watson-Crick complementary rules. That is, that the larger purines 
will always base pair Math the smaller pyrimidines to form only combinations of guanine paired 

15 with cytosine (G:C) and adenine paired with either thymine (A:T), in the case of DNA, or 
adenine paired with uracil (A:U) in the case of RNA. 

As used herein, the term "complementary sequences' 1 means nucleic acid sequences that 
are substantially complementary, as may be assessed by the same nucleotide comparison set 
forth above, or as defined as being capable of hybridizing to the nucleic acid segment of SEQ ID 

20 NO:l under relatively stringent conditions such as those described herein. As such, these 
complementary sequences are substantially complementary over their entire length and have 
very few base mismatches. For example, nucleic acid sequences of six bases in length may be 
termed complementary when they hybridize at five out of six positions with only a single 
mismatch. Naturally, nucleic acid sequences which are "completely complementary" will be 

25 nucleic acid sequences which are entirely complementary throughout their entire length and have 
no base mismatches. Equivalents will show transcriptional activity. This is one feature which 
will distinguish it from non-HA4 nucleic acid sequences. 

Antisense constructs are oligo- or polynucleotides comprising complementary 
nucleotides to the coding segment of a DNA molecule, such as a gene or cDNA, including both 

30 the exons, introns and exomintron boundaries of a gene. Antisense molecules are designed to 
inhibit the transcription, translation or both, of a given gene or construct, such that the levels of 
the resultant protein product are reduced or diminished. Antisense RNA constructs, or DNA 
encoding such antisense RNAs, may be employed to inhibit gene transcription or translation or 
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both within a host cell, either in vitro or in vivo, such as within a host animal, including a human 
subject. 

In other aspects, the invention may comprise use of a ribozyme. HA4 nucleic acids may 
be constructed or isolated which, when transcribed, produce RNA enzymes - ribozymes - that 
5 can act as endoribonucleases and catalyze the cleavage of RNA molecules with selected 
sequences. The cleavage of selected messenger RNAs can result in the reduced production of 
their encoded polypeptide products. These genes may be used to prepare one or more novel 
cells, tissues and organisms which possess them. The transgenic cells, tissues or organisms may 
possess reduced levels of polypeptides including, but not limited to, the polypeptides cited 
10 above. 

B. Hybridization Probes 

The nucleic acid segments of the present invention, regardless of the length of the coding 
sequence itself, may be combined with other DNA sequences, such as promoters, 

15 polyadenylation signals, additional restriction enzyme sites, multiple cloning sites, other coding 
segments, and the like, such that their overall length may vary considerably. It is therefore 
contemplated that a nucleic acid fragment of almost any length may be employed, with the total 
length preferably being limited by the ease of preparation and use in the intended recombinant 
DNA protocol. In addition to their use in directing the expression of the HA4 protein, the 

20 nucleic acid sequences disclosed herein also have a variety of other uses. For example, they also 
have utility as probes or primers in nucleic acid hybridization embodiments. As such, it is 
contemplated that nucleic acid segments that comprise a sequence region that consists of at least 
a 14 nucleoti de-long contiguous sequence that has the same sequence as, or is complementary to, 
a 14 nucleotide-long contiguous sequence of SEQ ID NO:l, will find particular utility. Longer 

25 contiguous identical or complementary sequences will also be of use in certain embodiments. 

It will be readily understood that "intermediate lengths," in this context, means any 
length between the quoted ranges, such as 14, 15, 16, 17, 18, 19, 20, etc.; 21, 22, 23, etc; 30, 31, 
32, etc.; 50, 51, 52, 53, etc.; 100, 101, 102, 103, etc.; 150, 151, 152, 153, etc.; including all 
integers through the 200-500; 500-1,000; 1,000-2,000; 2,000-3,000; 3,000-5,000; 5,000-10,000 

30 ranges, up to and including sequences of about 12,001, 12,002, 13,001, 13,002 and the like. 

The ability of such nucleic acid probes to specifically hybridize to HA4 encoding 
sequences will enable them to be of use in detecting the presence of complementary sequences in 
a given sample. However, other uses are envisioned, including the use of the sequence 
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information for the preparation of mutant species primers, or primers for use in preparing other 
genetic constructions. 

Nucleic acid molecules having sequence regions consisting of contiguous nucleotide 
stretches of 10, 20, 30, 40, 50, or even of 100-200 nucleotides or so, identical or complementary 
5 to SEQ ID NO:l, are particularly contemplated as hybridization probes for use in, e.g., Southern 
and northern blotting. The inventors have also identified the sequence of genomic DNA for 
human HA4. The total size of the fragment, as well as the size of the complementary stretch(es), 
will ultimately depend on the intended use or application of the particular nucleic acid segment. 
Smaller fragments will generally find use in hybridization embodiments, wherein the length of 

10 the contiguous complementary region may be varied, such as between about 10 and about 100 
nucleotides, but larger contiguous complementary stretches may be used. 

The use of a hybridization probe of about 10-14 nucleotides in length allows the 
formation of a duplex molecule that is both stable and selective. Molecules having contiguous 
complementary sequences over stretches greater than 1 0 bases in length are generally preferred, 

15 though, in order to increase stability and selectivity of the hybrid, and thereby improve the 
quality and degree of specific hybrid molecules obtained, one will generally prefer to design 
nucleic acid molecules having gene-complementary stretches of 15 to 20 contiguous nucleotides, 
or even longer where desired. 

Hybridization probes may be selected from any portion of any of the sequences disclosed 

20 herein. All that is required is to review the sequence set forth in SEQ ID NO:l and to select any 
continuous portion of the sequence, from about 10 nucleotides in length up to and including the 
full length sequence, that one wishes to utilize as a probe or primer. The choice of probe and 
primer sequences may be governed by various factors, such as, by way of example only, one 
may wish to employ primers from towards the termini of the total sequence, or from the ends of 

25 the functional domain-encoding sequences, in order to amplify further DNA; one may employ 
probes corresponding to the entire DNA, or to the zinc finger region, or to the proline-rich 
sequence to clone HA4-type genes from other species or to clone further HA4-like or 
homologous genes from any species including human; and one may employ wild-type and 
mutant probes or primers with sequences centered around the zinc finger or proline-rich 

30 sequence to screen DNA samples for HA4. Moreover, one may employ probes or primers with 
sequences centered around the different HA4 isoforms. 

The process of selecting and preparing a nucleic acid segment that includes a contiguous 
sequence from within SEQ ID NO:l may alternatively be described as preparing a nucleic acid 
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fragment. Of course, fragments may also be obtained by other techniques such as, e.g., by 
mechanical shearing or by restriction enzyme digestion. Small nucleic acid segments or 
fragments may be readily prepared by, for example, directly synthesizing the fragment by 
chemical means, as is commonly practiced using an automated oligonucleotide synthesizer. 
5 Also, fragments may be obtained by application of nucleic acid reproduction technology, such as 
the PCR™ technology of U.S. Patent 4,683,202 and U.S. Patent 4,682,195 (each incorporated 
herein by reference), by introducing selected sequences into recombinant vectors for 
recombinant production, and by other recombinant DNA techniques generally known to those of 
skill in the art of molecular biology. 

10 Accordingly, the nucleotide sequences of the invention may be used for their ability to 

selectively form duplex molecules with complementary stretches of HA4 genes or cDNAs. 
Depending on the application envisioned, one will desire to employ varying conditions of 
hybridization to achieve varying degrees of selectivity of probe towards target sequence. For 
applications requiring high selectivity, one will typically desire to employ stringent conditions to 

15 form the hybrids, e.g., 0.02M-0.15M NaCl at temperatures of 50°C to 70°C. Such selective 
conditions tolerate little, if any, mismatch between the probe and the template or target strand, 
and would be particularly suitable for isolating HA4 genes. 

Of course, for some applications, for example, where one desires to prepare mutants 
employing a mutant primer strand hybridized to an underlying template or where one seeks to 

20 isolate HA4 encoding sequences from related species, functional equivalents, or the like, less 
stringent hybridization conditions will typically be needed in order to allow formation of the 
heteroduplex. In these circumstances, one may desire to employ conditions such as 0.15M-1.0M 
salt, at temperatures ranging from 20°C to 55°C. Cross-hybridizing species can thereby be 
readily identified as positively hybridizing signals with respect to control hybridizations. In any 

25 case, it is generally appreciated that conditions can be rendered more stringent by decreasing 
NaCl concentrations or by the addition of increasing amounts of formamide, which serves to 
destabilize the hybrid duplex in the same manner as increased temperature. Thus, hybridization 
conditions can be readily manipulated, and thus will generally be a method of choice depending 
on the desired results. 

30 In certain embodiments, it will be advantageous to employ nucleic acid sequences of the 

present invention in combination with an appropriate means, such as a label, for determining 
hybridization. A wide variety of appropriate indicator means are known in the art, including 
fluorescent, radioactive, enzymatic or other ligands, such as avidin/biotin, Avhich are capable of 
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giving a detectable signal. In preferred embodiments, one will likely desire to employ a 
fluorescent label or an enzyme tag, such as urease, alkaline phosphatase or peroxidase, instead of 
radioactive or other environmental undesirable reagents. In the case of enzyme tags, 
colorimetric indicator substrates are known that can be employed to provide a means visible to 
5 the human eye or spectrophotometrically, to identify specific hybridization with complementary 
nucleic acid-containing samples. 

In general, it is envisioned that the hybridization probes described herein will be useful 
both as reagents in solution hybridization as well as in embodiments employing a solid phase. In 
embodiments involving a solid phase, the test DNA (or RNA) is adsorbed or otherwise affixed to 

10 a selected matrix or surface. This fixed, single-stranded nucleic acid is then subjected to specific 
hybridization with selected probes under desired conditions. The selected conditions will depend 
on the particular circumstances based on the particular criteria required (depending, for example, 
on the GH-C contents, type of target nucleic acid, source of nucleic acid, size of hybridization 
probe, etc.). Following washing of the hybridized surface so as to remove nonspecifically bound 

15 probe molecules, specific hybridization is detected, or even quantified, by means of the label. 

It will also be understood that this invention is not limited to the particular nucleic acid 
and amino acid sequences of SEQ ID NOS:l and 2. Recombinant vectors and isolated DNA 
segments may therefore variously include the HA4 coding regions themselves, coding regions 
bearing selected alterations or modifications in the basic coding region, or they may encode 

20 larger polypeptides that nevertheless include HA4 coding regions or may encode biologically 
functional equivalent proteins or polypeptides that have variant amino acids sequences. 

The DNA segments of the present invention encompass biologically functional 
equivalent HA4 proteins and polypeptides. Such sequences may arise as a consequence of codon 
redundancy and functional equivalency that are known to occur naturally within nucleic acid 

25 sequences and the proteins thus encoded. Alternatively, functionally equivalent proteins or 
polypeptides may be created via the application of recombinant DNA technology, in which 
changes in the protein structure may be engineered, based on considerations of the properties of 
the amino acids being exchanged. Changes designed by man may be introduced through the 
application of site-directed mutagenesis techniques, e.g., to introduce improvements to the 

30 antigenicity of the protein or to test HA4 mutants in order to examine transcriptional activity at 
the molecular level. 

If desired, one may also prepare fusion proteins and polypeptides, e.g., where the HA4 
coding regions are aligned within the same expression unit with other proteins or polypeptides 

-25- 



WO 2004/041205 



PCT/US2003/035139 



having desired functions, such as for purification or immunodetection purposes (e.g., proteins 
that may be purified by affinity chromatography or identified by enzyme label coding regions, 
respectively). 

5 C. Recombinant Vectors and Protein Expression 

Recombinant vectors form important further aspects of the present invention. 
Particularly useful vectors are contemplated to be those vectors in which the coding portion of 
the DNA segment, whether encoding a full length protein or smaller polypeptide, is positioned 
under the control of a promoter. The promoter may be in the form of the promoter that is 

10 naturally associated with an HA4 gene, e.g., in osteoblasts as may be obtained by isolating the 5' 
non-coding sequences located upstream of the coding segment or exon, for example, using 
recombinant cloning and/or PCR™ technology, in connection with the compositions disclosed 
herein (PCR™ technology is disclosed in U.S. Patent 4,683,202 and U.S. Patent 4,682,195, each 
incorporated herein by reference). Alternatively, the promoter may be a "heterologous" source, 

1 5 i. e. , not the native HA4 promoter. 

1. Promoters and Enhancers 

The promoters and enhancers that control the transcription of protein encoding genes in 
20 mammalian cells are composed of multiple genetic elements. The cellular machinery is able to 
gather and integrate the regulatory information conveyed by each element, allowing different 
genes to evolve distinct, often complex patterns of transcriptional regulation. Tables 3 and 4 
describe suitable promoter/enhancer elements. 

The term promoter will be used here to refer to a group of transcriptional control modules 
25 that are clustered around the initiation site for RNA polymerase II. Much of the thinking about 
how promoters are organized derives from analyses of several viral promoters, including those 
for the HSV thymidine kinase (tk) and SV40 early transcription units. These studies, augmented 
by more recent work, have shown that promoters are composed of discrete functional modules, 
each consisting of approximately 7-20 bp of DNA, and containing one or more recognition sites 
30 for transcriptional activator proteins. At least one module in each promoter functions to position 
the start site for RNA synthesis. The best known example of this is the TATA box, but in some 
promoters lacking a TATA box, such as the promoter for the mammalian terminal 
deoxynucleotidyl transferase gene and the promoter for the SV40 late genes, a discrete element 
overlying the start site itself helps to fix the place of initiation. 
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Additional promoter elements regulate the frequency of transcriptional initiation. 
Typically, these are located in the region 30-1 10 bp upstream of the start site, although a number 
of promoters have recently been shown to contain functional elements downstream of the start 
site as well. The spacing between elements is flexible, so that promoter function is preserved 
5 when elements are inverted or moved relative to one another. In the tk promoter, the spacing 
between elements can be increased to 50 bp apart before activity begins to decline. Depending 
on the promoter, it appears that individual elements can function either cooperatively or 
independently to activate transcription. 

Enhancers were originally detected as genetic elements that increased transcription from 
10 a promoter located at a distant position on the same molecule of DNA. This ability to act over a 
large distance had little precedent in classic studies of prokaryotic transcriptional regulation. 
Subsequent work showed that regions of DNA with enhancer activity are organized much like 
promoters. That is, they are composed of many individual elements, each of which binds to one 
or more transcriptional proteins. 

15 

TABLE 3 



PROMOTERS 


REFERENCES 


Immunoglobulin Heavy Chain 


Gilles et al, 19S3; Grosschedl and Baltimore, 
1985; Atchinson and Perry, 1986, 1987; frnler et 
al, 1987; Weinberger et al, 1988; Kiledjian et 
al, 1988; Porton et al, 1990 


Immunoglobulin Light Chain 


Queen and Baltimore, 1983; Picard and 
Schaffher, 1984 


T-Cell Receptor 


Luria et al, 1987, Winoto and Baltimore, 1989; 
Redondo et al., 1990 


HLA DQ a and DQ fi 


Sullivan and Peterlin, 1987 


6-Interferon 


Goodbourn et al, 1986; Fujita et al., 1987; 
Goodbourn and Maniatis, 1985 


Interleukin-2 


Greenes a/., 1989 


Interleukin-2 Receptor 


Greene et al, 1989; Lin et al, 1990 


MHC Class II 5 


Koch et al, 1989 


MHC Class H HLA-Dra 


Shermans al, 1989 


B-Actin 


Kawamoto et al, 1988; Ng et al, 1989 


Muscle Creatine Kinase 


Jaynes et al, 1988; Horlick and Benfield, 1989; 
Johnson et al, 1989a 


Prealbumin (Transthyretin) 


Costa al, 1988 


Elastase I 


Omitz etal, 1987 


Metallothionein 


Karin et al, 1987; Culotta and Hamer, 1989 


Collagenase 


Pinkert et al, 1987; Angel et al, 1987 
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Albumin vjene 


jrinKen ei ui., iyoi , irunune ei ui., iyoy, lyyv 


a-Fetoprotein 


vjroQooui ei cu.y i700j v^diiipere <uiu liigniriaii, 
i yoy 


t-ljrlODin 


RnHin<a qtiH T pv 1 QR7* Pprp^-^stahlp anH 
DUuuic diiu. iyo / , x cicZi**oiciuic/ duu 

Constantini, 1990 




iruuei dnu v^unbidiimu, iyo/ 


e-fos 


Kronen ei cii. , iyo/ \ 


c-riA-ras 


iriesmd.n > i^oo, ljcbofldiiipk et cii. y iyoj 


Insulin 


XiaiUnu 61 til., LyoD 


Neural Cell Adhesion Molecule 
. (JNLAM) 


Hirsch et al, 1990 


arAntitrypain 


i^atimer ei at., Lyyu 


H2B (TH2B) Histone 


Hwang al, 1990 


Mouse or Type I Collagen 


Ripe et al, 19o9 


Glucose-Regulated Proteins 
(GRP94 and GKP78) 


Cnang et at., 1989 


Rat Growth Hormone 


.Larsen et cii., i^oo 


Human Serum Amyloid A 
(SAA) 


Edbrooke a/., 1989 


Troponin I (TN I) 


Yutzey et al., 1989 


Platelet-Derived Growth Factor 


Pech era/., 1989 


Duchenne Muscular Dystrophy 


Klamut etal, 1990 j 


SV40 


Banerji et al, 1981; Moreau et al, 1981; Sleigh 
and Lockett, 1985; Firak and Subramanian, 1986; 
Herr and Clarke, 1986; Imbra and Karin, 1986; 
Kadesch and Berg, 1986; Wang and Calame, 
1986; Ondek et al, 1987; Kuhl et al, 1987 
Schaffher et al, 1 98S j 


Polyoma 


Swartzendruber and Lehman, 1975; Vasseur et 
al, 1980; Katinka et al, 1980, 1981; Tyndell et 
al, 1981; Dandolo et al, 1983; deVilliers et al, 
1984; Hen et al, 1986; Satake et al, 1988; 
Campbell and Villarreal, 1988 


Retroviruses 


Kriegler and Botchan, 1982, 1983; Levinson et 
al, 1982; Kriegler et al, 1983, 1984a,b, 1988; 
Bosze et al, 1986; Miksicek et al, 1986; 
Celander and Haseltine, 1987; Thiesen et al, 
1988; Celander et al, 1988; Choi et al, 1988; 

T\ * 1 A-A- "1 f\Of\ 

Reisman and Rotter, 1989 


Papilloma Virus 


Campo et al, 1983; Lusky et al, 1983; Spandidos 
and Wilkie, 1983; Spalholz et al, 1985; Lusky 

dllU. XJULOllail, l7oUj v-'lljpo 0/ Lll., 1Z70/, VJiUoo t>i 

a/., 1987; Hirochika eif al, 1987, Stephens and 
Hentschel, 1987; Glue et al, 1988 


Hepatitis B Virus 


Bulla and Siddiqui, 1986; Jameel and Siddiqui, 
1986; Shaul and Ben-Levy, 1987; Spandau and 
Lee, 1988; Vannice and Levinson, 1988 
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PROMOTERS 


REFERENCES 


Human Immunodeficiency 
Virus 


Muesing et aL, 1987; Hauber and Cullan, 1988; 
Jakobovits et aL, 1988; Feng and Holland, 198S; 

loivCUC ei at., I7oo, XvUWcIl £i tit., li/oO, jDerKllOUl 

1989; Laspia et al, 1989; Sharp and 
Marciniak, 1989; Braddock et al, 1989 


Cytomegalovirus 


Weber et al, 1984; Boshart a/., 1985; Foecking 
and Hofstetter, 1986 


Gibbon Ape Leukemia Virus 


Holbrook et al, 1987; Quinn et al, 1989 



TABLE 4 









MT II 


Phorbol Ester (TFA) 
xieavy metais 


Palmiter et aL, 1982; 
Haslinger and Karin, 1985; 
oedxie ei ui., lyoj, oiuarr ei 

ri 1 1 QR^ • TmoOQWQ of nl 

170J, JJLlld.g.clWcl t,l til., 

1987- Karin 1987* An^el pt 
aL, 1987b; McNeall et aL, 
1989 


MMTV (mouse 
mammary tumor virus) 


Glucocorticoids 


Huang et aL, 1981; Lee et 
aL, 1981; Majors and 
Varmus, 1983; Chandler et 
al. 9 1983; Lee et aL, 1984; 
Fonta et aL, 1985; Sakai et 
aL, 1986 


B-Interferon 


Poly(rI)X 
Poly(rc) 


Tavemier et a.L, 1983 


Adenovirus 5 E2 


Ela 


Imperiale and Nevins, 1984 


Collagenase 


Phorbol Ester (TP A) 


Angel etaL, 1987a 


Stromelysin 


Phorbol Ester (TP A) 


Angel a/., 1987b 


SV40 


Phorbol Ester (TFA) 


Angel etaL, 1987b 


Murine MX Gene 


Interferon, Newcastle 
Disease Virus 




GRP78 Gene 


A23187 


Resendez etaL, 1988 


a-2-Macroglobulin 


IL-6 


KunzetaL, 1989 


Vimentin 


Serum 


Rittlinge/tf/., 1989 


MHC Class I Gene H- 
2kb 


Interferon 


Blanar etaL, 1989 


HSP70 


Ela, S V40 Large T Antigen 


Taylor et aL, 1989; Taylor 
and Kingston, 1990a,b 


Proliferin 


Phorbol Ester-TP A 


Mordacq and Linzer, 1989 


Tumor Necrosis Factor 


FMA 


Hensel etaL, 1989 


Thyroid Stimulating 
Hormone a Gene 


Thyroid Hormone 


Chatterjee etaL, 1989 
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It is understood in the art that to bring a coding sequence under the control of a promoter, 
one positions the 5' end of the transcription initiation site of the transcriptional reading frame of 
the protein between about 1 and about 50 nucleotides "downstream" of (i.e., 3' of) the chosen 
promoter. In addition, where eukaryotic expression is contemplated, one will also typically 
5 desire to incorporate into the transcriptional unit which includes the cotransporter protein, an 
appropriate polyadenylation site (e.g., 5'-AATAAA-3') if one was not contained within the 
original cloned segment. Typically, the poly-A addition site is placed about 30 to 2000 
nucleotides "downstream 55 of the termination site of the protein at a position prior to transcription 
termination. 

10 

2. Expression Vectors 

As mentioned above, in connection with expression embodiments to prepare recombinant 
HA4 proteins and polypeptides, it is contemplated that longer DNA segments will most often be 
used, with DNA segments encoding the entire HA4 protein being most preferred. However, it 

15 will be appreciated that the use of shorter DNA segments to direct the expression of HA4 
polypeptides or epitopic core regions, such as may be used to generate anti-HA4 antibodies, also 
falls within the scope of the invention. 

Once a suitable (full length if desired) clone or clones have been obtained, whether they 
be cDNA based or genomic, one may proceed to prepare an expression system for the 

20 recombinant preparation of HA4. The engineering of DNA segment(s) for expression in a 
prokaryotic or eukaryotic system may be performed by techniques generally known to those of 
skill in recombinant expression. It is believed that virtually any expression system may be 
employed in the expression of HA4. 

It is proposed that transformation of host cells with DNA segments encoding the HA4 

25 protein will provide a convenient means for obtaining active HA4. However, separate 
expression followed by reconstitution is also certainly within the scope of the invention. 

Both cDNA and genomic sequences are suitable for eukaryotic expression, as the host 
cell will generally process the genomic transcripts to yield functional mRNA for translation into 
protein. Generally speaking, it may be more convenient to employ as the recombinant gene a 

30 cDNA version of the gene. It is believed that the use of a cDNA version will provide advantages 
in that the size of the gene will generally be much smaller and more readily employed to 
transfect the targeted cell than will a genomic gene, which will typically be up to an order of 
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magnitude larger than the cDNA gene. However, the inventors do not exclude the possibility of 
employing a genomic version of a particular gene where desired. 

In addition, it is possible to express partial sequences, e.g., for the generation of 
antibodies against discrete portions of a gene product, even when the entire sequence of that 
5 gene product remains unknown. As noted herein, computer programs are available to aid in the 
selection of regions which have potential immunologic significance. For example, software 
capable of carrying out this analysis is readily available commercially, for example MacVector 
(DBI, New Haven, CT). The software typically uses standard algorithms such as the 
Kyte/Doolittle or Hopp/Woods methods for locating hydrophilic sequences which are 
10 characteristically found on the surface of proteins and are, therefore, likely to act as antigenic 
determinants. 

In the recombinant production of large amounts of proteins or polypeptides, it may be 
advisable to analyze the protein to detect putative transmembrane sequences. Such sequences 
are typically very hydrophobic and are readily detected by the use of standard sequence analysis 

15 software, such as MacVector (IBI, New Haven, CT). The presence of transmembrane sequences 
is often deleterious when a recombinant protein is synthesized in many expression systems, 
especially E. coli, as it leads to the production of insoluble aggregates that are difficult to 
renature into the native conformation of the protein. Deletion of transmembrane sequences 
typically does not significantly alter the conformation of the remaining protein structure. 

20 Moreover, transmembrane sequences, being by definition embedded within a membrane, 

are inaccessible. , Antibodies to these sequences will not, therefore, generally prove useful in in 
vivo or in situ studies. Deletion of transrnembrane-encoding sequences from the genes used for 
expression can be achieved by standard techniques. For example, fortuitously-placed restriction 
enzyme sites can be used to excise the desired gene fragment, or PCR™-type amplification can 

25 be used to amplify only the desired part of the gene. 

As used herein, the terms "engineered" and "recombinant" cells are intended to refer to a 
cell into which an exogenous DNA segment or gene, such as a cDNA or gene encoding an HA4 
protein or polypeptide has been introduced. Therefore, engineered cells are distinguishable from 
naturally occurring cells which do not contain a recombinantly introduced exogenous DNA 

30 segment or gene. Engineered cells are thus cells having a gene or genes introduced through the 
hand of man. Recombinant cells include those having an introduced cDNA or genomic gene, 
and also include genes positioned adjacent to a promoter not naturally associated with the 
particular introduced gene. 
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To express a recombinant HA4 protein or polypeptide, whether mutant or wild-type, in 
accordance with the present invention one would prepare an expression vector that comprises an 
HA4 protein or polypeptide-encoding nucleic acid segment under the control of one or more 
promoters. To bring a coding sequence "under the control of a promoter, one positions the 5' 
5 end of the transcription initiation site of the transcriptional reading frame generally between 
about 1 and about 50 nucleotides "downstream" of (i.e, y 3' of) the chosen promoter. The 
"upstream" promoter stimulates transcription of the DNA and promotes expression of the 
encoded recombinant protein. This is the meaning of "recombinant expression" in this context. 

10 i) Host Cells 

Many standard techniques are available to construct expression vectors containing the 
appropriate nucleic acids and transcriptional/translational control sequences in order to achieve 
protein or polypeptide expression in a variety of host-expression systems. Cell types available 
for expression include, but are not limited to, bacteria, such as E. coli and B. subtilis 
15 transformed with recombinant bacteriophage DNA, plasmid DNA or cosmid DNA expression 
vectors. 

Certain examples of prokaryotic hosts are E. coli strain RR1, E. coli LE392, E. coli B, E. 
coli X 1776 (ATCC No. 31537) as well as E. coli W31 10 (F-, lambda-, prototrophic, ATCC No. 
273325); bacilli such as Bacillus subtilis; and other enterobacteriaceae such as Salmonella 

20 typhimurium, Serratia marcescens, and various Pseudomonas species. 

In general, plasmid vectors containing replicon and control sequences which are derived 
from species compatible with the host cell are used in connection with these hosts. The vector 
ordinarily carries a replication origin, as well as marking sequences which are capable of 
providing phenotypic selection in transformed cells. For example, E. coli is often transformed 

25 using pBR322, a plasmid derived from an E. coli species. pBR322 contains genes for ampicillin 
and tetracycline resistance and thus provides means for identifying transformed cells. The pBR 
plasmid, or other microbial plasmid or phage must also contain, or be modified to contain, 
promoters which can be used by the microbial organism for expression of its own proteins. 

In addition, phage vectors containing replicon and control sequences that are compatible 

30 with the host microorganism can be used as transforming vectors in connection with these hosts. 
For example, the phage lambda GEM™- 11 may be utilized in making a recombinant phage 
vector which can be used to transform host cells, such as E. coli LE392. 
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Further useful vectors include pIN vectors (Inouye et aL 9 1985); and pGEX vectors, for 
use in generating glutathione S-transferase (GST) soluble fusion proteins for later purification 
and separation or cleavage. Other suitable fusion proteins are those with 6-galactosidase, 
ubiquitin, mannose binding protein (MBP) and the like. 
5 Promoters that are most commonly used in recombinant DNA construction include the 

p-lactamase (penicillinase), lactose and tryptophan (trp) promoter systems. While these are the 
most commonly used, other microbial promoters have been discovered and utilized, and details 
concerning their nucleotide sequences have been published, enabling those of skill in the art to 
ligate them functionally with plasmid vectors. 
10 The following details concerning recombinant protein production in bacterial cells, such 

as E. coli, are obtained from exemplary information on recombinant protein production in 
general, the adaptation of which to a particular recombinant expression system will be known to 
those of skill in the art. 

Bacterial cells, for example, E. coli, containing the expression vector are grown in any of 
15 a number of suitable media, for example, LB. The expression of the recombinant protein may be 
induced, e.g., by adding EPTG to the media or by switching incubation to a higher temperature. 
After culturing the bacteria for a further period, generally of between 2 and 24 hours, the cells 
are collected by centrifugation and washed to remove residual media. 

The bacterial cells are then lysed, for example, by disruption in a cell homogenizer and 
20 centrifuged to separate the dense inclusion bodies and cell membranes from the soluble cell 
components. This centrifugation can be performed under conditions whereby the dense 
inclusion bodies are selectively enriched by incorporation of sugars, such as sucrose, into the 
buffer and centrifugation at a selective speed. 

If the recombinant protein is expressed in the inclusion bodies, as is the case in many 
25 instances, these can be washed in any of several solutions to remove some of the contaminating 
host proteins, then solubilized in solutions containing high concentrations of urea (e.g., 8M) or 
chaotropic agents such as guanidine hydrochloride in the presence of reducing agents, such as 
6-mercaptoethanol or DTT (dithiothreitol). 

Under some circumstances, it may be advantageous to incubate the protein for several 
30 hours under conditions suitable for the protein to undergo a refolding process into a 
conformation which more closely resembles that of the native protein. Such conditions generally 
include low protein concentrations, less than 500 ng/ml, low levels of reducing agent, 
concentrations of urea less than 2 M and often the presence of reagents such as a mixture of 
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reduced and oxidized glutathione which facilitate the interchange of disulfide bonds within the 
protein molecule. 

The refolding process can be monitored, for example, by SDS-PAGE, or with antibodies 
specific for the native molecule (which can be obtained from animals immunized with the native 
5 molecule or smaller quantities of recombinant protein). Following refolding, the protein can 
then be purified further and separated from the refolding mixture by chromatography on any of 
several supports including ion exchange resins, gel permeation resins or on a variety of affinity 
columns. 

In addition to prokaryotes, eukaryotic microbes, such as yeast cultures, may also be used. 

10 Saccharomyces cerevisiae, or common baker's yeast, is the most commonly used among 
eukaryotic microorganisms, although a number of other strains are commonly available. For 
expression in Saccharomyces, the plasmid YRp7, for example, is commonly used (Stinchcomb et 
aL, 1979; Kingsman et aL, 1979; Tschemper et aL, 1980). This plasmid already contains the trp\ 
gene which provides a selection marker for a mutant strain of yeast lacking the ability to grow in 

15 tryptophan, for example ATCC No. 44076 or PEP4-1 (Jones, 1977). The presence of the trp\ 
lesion as a characteristic of the yeast host cell genome then provides an effective environment for 
detecting transformation by growth in the absence of tryptophan. 

Suitable promoting sequences in yeast vectors include the promoters for 
3-phosphoglycerate kinase (Hitzeman et aL, 1980) or other glycolytic enzymes (Hess et aL, 

20 1968; Holland et aL, 1978), such as enolase, glyceraldehyde-3 -phosphate dehydrogenase, 
hexokinase, pyruvate decarboxylase, phosphofructokinase, glucose-6-phosphate isomerase, 
3-phosphoglycerate mutase, pyruvate kinase, triosephosphate isomerase, phosphoglucose 
isomerase, and glucokinase. In constructing suitable expression plasmids, the termination 
sequences associated with these genes are also ligated into the expression vector 3' of the 

25 sequence desired to be expressed to provide polyadenylation of the mRNA and termination. 

Other suitable promoters, which have the additional advantage of transcription controlled 
by growth conditions, include the promoter region for alcohol dehydrogenase 2, isocytochrome 
C, acid phosphatase, degradative enzymes associated with nitrogen metabolism, and the 
aforementioned glyceraldehyde-3-phosphate dehydrogenase, and enzymes responsible for 

30 maltose and galactose utilization. 

In addition to micro-organisms, cultures of cells derived from multicellular organisms 
may also be used as hosts. In principle, any such cell culture is workable, whether from 
vertebrate or invertebrate culture. In addition to mammalian cells, these include insect cell 
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systems infected with recombinant virus expression vectors (e.g., baculo virus); and plant cell 
systems infected with recombinant virus expression vectors {e.g., cauliflower mosaic virus, 
CaMV; tobacco mosaic virus, TMV) or transformed with recombinant plasmid expression 
vectors (e.g., Ti plasmid) containing one or more HA4 protein or polypeptide coding sequences. 
5 In a useful insect system, Antographica californica nuclear polyhidrosis virus (AcNPV) 

is used as a vector to express foreign genes. The virus grows in Spodoptera frugiperda cells. 
The HA4 protein or polypeptide coding sequences are cloned into non-essential regions (for 
example the polyhedrin gene) of the virus and placed under control of an AcNPV promoter (for 
example the polyhedrin promoter). Successful insertion of the coding sequences results in the 

10 inactivation of the polyhedrin gene and production of non-occluded recombinant virus (i.e., virus 
lacking the proteinaceous coat coded for by the polyhedrin gene). These recombinant viruses are 
then used to infect Spodoptera frugiperda cells in which the inserted gene is expressed {e.g., 
U.S. Patent 4,215,051). 

Examples of useful mammalian host cell lines are VERO and HeLa cells, Chinese 

15 hamster ovary (CHO) cell lines, W138, BHK, COS-7, 293, HepG2, 3T3, RIN and MDCK cell 
lines. In addition, a host cell strain may be chosen that modulates the expression of the inserted 
sequences, or modifies and processes the gene product in the specific fashion desired. Such 
modifications (e.g., glycosylation) and processing (e.g., cleavage) of protein products may be 
important for the function of the protein. 

20 Different host cells have characteristic and specific mechanisms for the post-translational 

processing and modification of proteins. Appropriate cell lines or host systems can be chosen to 
ensure the correct modification and processing of the foreign protein expressed. To this end, 
eukaryotic host cells which possess the cellular machinery for glycosylation, intracellular 
transport, high expression and DNA replication may be used if desired, with a cell that allows for 

25 high expression being preferred. 

The ability of certain viruses to infect cells or enter cells via receptor-mediated 
endocytosis, and to integrate into host cell genome and express viral genes stably and efficiently 
have made them attractive candidates for the transfer of foreign nucleic acids into cells (e.g., 
mammalian cells). Non-limiting examples of virus vectors that may be used to deliver a nucleic 

30 acid of the present invention are described below. 
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ii) Adenoviral Vectors 

A particular method for delivery of the nucleic acid involves the use of an adenovirus 
expression vector. Although adenovirus vectors are known to have a low capacity for integration 
into genomic DNA, this feature is counterbalanced by the high efficiency of gene transfer 
5 afforded by these vectors. "Adenovirus expression vector" is meant to include those constructs 
containing adenovirus sequences sufficient to (a) support packaging of the construct and (b) to 
ultimately express a tissue or cell-specific construct that has been cloned therein. Knowledge of 
the genetic organization or adenovirus, a 36 kb, linear, double-stranded DNA virus, allows 
substitution of large pieces of adenoviral DNA with foreign sequences up to 7 kb (Grunhaus and 
10 Horwitz, 1992). 

iii) AAV Vectors 

The nucleic acid may be introduced into the cell using adenovirus assisted transfection. 
Increased transfection efficiencies have been reported in cell systems using adenovirus coupled 

15 systems (Kelleher and Vos, 1994; Cotten etal, 1992; Curiel, 1994). Adeno-associated virus 
(AAV) is an attractive vector system for use in the present invention as it has a high frequency of 
integration and it can infect nondividing cells, thus making it useful for delivery of genes into 
mammalian cells, for example, in tissue culture (Muzyczka, 1992) or in vivo. AAV has a broad 
host range for infectivity (Tratschin et al. t 1984; Laughlin et ah, 1986; Lebkowski et al, 1988; 

20 McLaughlin et aL, 1988). Details concerning the generation and use of rAAV vectors are 
described in U.S. Patent 5,139,941 and 4,797,368, each incorporated herein by reference. 

iv) Retroviral Vectors 

Retroviruses are valuable delivery vectors in due, in part, to their ability to integrate their 
25 genes into the host genome, transferring a large amount of foreign genetic material, infecting a 
broad spectrum of species and cell types and of being packaged in special cell-lines (Miller, 
1992). In order to construct a retroviral vector, a nucleic acid of interest is inserted into the viral 
genome in the place of certain viral sequences to produce a virus that is replication-defective. In 
order to produce virions, a packaging cell line containing the gag, pol, and env genes but without 
30 the LTR and packaging components is constructed (Manner al t 1983). When a recombinant 
plasmid containing a cDNA, together with the retroviral LTR and packaging sequences is 
introduced into a special cell line (e.g., by calcium phosphate precipitation for example), the 
packaging sequence allows the RNA transcript of the recombinant plasmid to be packaged into 
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viral particles, which are then secreted into the culture media (Nicolas and Rubenstein, 198S; 
Temin, 1986; Mann etaL, 1983). The media containing the recombinant retroviruses is then 
collected, optionally concentrated, and used for gene transfer. Retroviral vectors are able to 
infect a broad variety of cell types. However, integration and stable expression require the 
division of host cells (Paskind et aL, 1975). 

Lentiviruses are complex retroviruses, which, in addition to the common retroviral genes 
gag, pol, and env, contain other genes with regulatory or structural function. Lentiviral vectors 
are well known in the art (see, for example, Naldini et aL, 1996; Zufferey et aL, 1997 Blomer et 
aL, 1997; U.S. Patents 6,013,516 and 5,994,136). Some examples of lentivirus include the 
Human Immunodeficiency Viruses: HIV-1, HIV-2 and the Simian Immunodeficiency Virus: 
SIV. Lentiviral vectors have been generated by multiply attenuating the HIV virulence genes, 
for example, the genes env, vif, vpr, vpu and nef are deleted making the vector biologically safe. 

Recombinant lentiviral vectors are capable of infecting non-dividing cells and can be 
used for both in vivo and ex vivo gene transfer and expression of nucleic acid sequences. For 
example, recombinant lentivirus capable of infecting a non-dividing cell wherein a suitable host 
cell is transfected with two or more vectors carrying the packaging functions, namely gag, pol 
and env, as well as rev and tat is described in U.S. Pat. No. 5,994,136, incorporated herein by 
reference. One may target the recombinant virus by linkage of the envelope protein with an 
antibody or a particular ligand for targeting to a receptor of a particular cell-type. By inserting a 
sequence (including a regulatory region) of interest into the viral vector, along with another gene 
which encodes the ligand for a receptor on a specific target cell, for example, the vector is now 
target-specific. 

v) Other Viral Vectors 

Other viral vectors may be employed as vaccine constructs in the present invention. 
Vectors derived from viruses such as vaccinia virus (Ridgeway, 1988; Baichwal and Sugden, 
1986; Coupai etal., 1988), sindbis virus, cytomegalovirus and herpes simplex virus may be 
employed. They offer several attractive features for various mammalian cells (Friedmann, 1989; 
Ridgeway, 1988; Baichwal and Sugden, 1986; Coupar et aL, 1988; Horwich et aL, 1990). 

vi) Delivery Using Modified Viruses 

A nucleic acid to be delivered may be housed within an infective virus that has been 
engineered to express a specific binding ligand. The virus particle will thus bind specifically to 
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the cognate receptors of the target cell and deliver the contents to the cell A novel approach 
designed to allow specific targeting of retrovirus vectors was developed based on the chemical 
modification of a retrovirus by the chemical addition of lactose residues to the viral envelope. 
This modification can permit the specific infection of hepatocytes via sialoglycoprotein 
5 receptors. 

Another approach to targeting of recombinant retroviruses was designed in which 
biotinylated antibodies against a retroviral envelope protein and against a specific cell receptor 
were used. The antibodies were coupled via the biotin components by using streptavidin 
(Roux et al t 1989). Using antibodies against major histocompatibility complex class I and class 
10 II antigens, they demonstrated the infection of a variety of human cells that bore those surface 
antigens with an ecotropic virus in vitro (Roux et ah, 1989). 

vii) Other Signals 

Specific initiation signals may also be required for efficient translation of HA4 coding 
15 sequences. These signals include the ATG initiation codon and adjacent Kosak sequences. 
Exogenous translational control signals, including the ATG initiation codon, may additionally 
need to be provided. One of ordinary skill in the art would readily be capable of determining this 
and providing the necessary signals. It is well known that the initiation codon must be in-frame 
(or in-phase) with the reading frame of the desired coding sequence to ensure translation of the 
20 entire insert. These exogenous translational control signals and initiation codons can be of a 
variety of origins, both natural and synthetic. The efficiency of expression may be enhanced by 
the inclusion of appropriate transcription enhancer elements, transcription terminators (Bittner et 
aL, 1987). 

In eukaryotic expression, one will also typically desire to incorporate into the 
25 transcriptional unit an appropriate polyadenylation site (e.g., 5'-AATAAA-3') if one was not 
contained within the original cloned segment. Typically, the poly A addition site is placed about 
30 to 2000 nucleotides "downstream" of the termination codon of the protein at a position prior 
to transcription termination. 

For long-term, high-yield production of recombinant HA4 proteins, stable expression is 
30 preferred. For example, cell lines that stably express constructs encoding HA4 proteins or 
polypeptides may be engineered. Rather than using expression vectors that contain viral origins 
of replication, host cells can be transformed with vectors controlled by appropriate expression 
control elements (e.g., promoter, enhancer, transcription terminators, polyadenylation sites, etc.), 
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and a selectable marker. Following the introduction of foreign DNA, engineered cells may be 
allowed to grow for 1-2 days in an enriched media, and then are switched to a selective media. 
The selectable marker in the recombinant plasmid confers resistance to the selection and allows 
cells to stably integrate the plasmid into their chromosomes and grow to form foci which in turn 
can be cloned and expanded into cell lines. 

viii) Selection Systems 

A number of selection systems may be used, including, but not limited, to the herpes 
simplex virus thymidine kinase (Wigler et aL, 1977), hypoxanthine-guanine 
phosphoribosyltransferase (Szybalska et aL, 1962) and adenine phosphoribosyltransferase genes 
(Lowry et aL, 1980), in tk-, hgprt- or aprt- cells, respectively. Also, anti-metabolite resistance 
can be used as the basis of selection for dhfr, that confers resistance to methotrexate (Wigler et 
aL, 1980; O'Hare et aL, 1981); gpt, that confers resistance to mycophenolic acid (Mulligan et aL, 
1981); neo, that confers resistance to the aminoglycoside G418 (Colberre-Garapin et aL, 1981); 
and hygro, that confers resistance to hygromycin (Santerre et aL, 1984). 

It is contemplated that the HA4 of the invention may be "overexpressed," i.e., expressed 
in increased levels relative to its natural expression in osteoblast cells, or even relative to the 
expression of other proteins in the recombinant host cell. Such overexpression may be assessed 
by a variety of methods, including radio-labeling and/or protein purification. However, direct 
methods are preferred, for example, those involving SDS/PAGE and protein staining or western 
blotting, followed by quantitative analyses, such as densitometric scanning of the resultant gel or 
blot. A specific increase in the level of the recombinant protein or polypeptide in comparison to 
the level in natural osteoblasts is indicative of overexpression, as is a relative abundance of the 
specific protein in relation to the other proteins produced by the host cell and, e.g., visible on a 
gel. 

IV. Development of HA4-Related Agents and Assays 

It is contemplated that the HA4-related agents described herein will be useful in many 
areas, for example in screening assays, monitoring amounts and qualities of HA4 in clinical 
samples or to target the expression of foreign genes into osteoblasts, all as described in more 
detail herein. As used herein, the term "HA4-related agents" refers to full length as well as 
partial DNA segments; other members of the HA4 family; isolated and purified native HA4 as 
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well as recombinantly produced HA4; antibodies raised to any of the above forms; cells and 
animals engineered to overproduce HA4. 

The HA4-related agents described herein may, of course, additionally be used to search 
for molecules that modulate the expression and/or function of HA4 (e.g., naturally occurring 
5 proteins, chemicals, synthetic peptides, carbohydrates, lipids, recombinant proteins, cell extracts, 
and supernatant, etc.). This may, for example, involve the use of HA4 transfectants to search for 
molecules that bind to HA4 in the cell to enhance its activity thereby enhancing bone production. 

Another contemplated use of the agents of the invention is to regulate cell differentiation 
for example, to regulate the differentiation of precursor cells, such as mesenchymal precursor 
10 cells, to form osteoblasts. In another example one may establish osteoblast lines by introducing 
HA4 promoters. This may be accomplished by using the 5'-flanking region of the HA4 gene to 
drive cellular differentiation toward osteoblasts or by using oncogenes (e.g., c-myc) driven by 
osteoblast-specific promoters. 

15 A. HA4-Related Agents and Assays 

The following reagents are included in the present invention as u HA4-related reagents": 
(a) DNA segments of HA4, including the 5'- and 3'-flanking regions, (b) RNA segments of sense 
or anti-sense strands of HA4, including truncated or mutated transcripts, (c) HA4 polypeptides or 
proteins, including truncated or mutated forms and their biological equivalents, (d) polyclonal or 

20 monoclonal antibodies against HA4, (e) cell lines that express HA4, (f) vectors designed to 
produce HA4 polypeptides or proteins, (g) cell lines that are engineered to express HA4, and (h) 
transgenic animals lacking at least one functional HA4 allele, or comprising an expression 
cassette with an HA4 promoter linked to a screenable marker. 

The following assays that employ HA4-related reagents are also included in the present 

25 invention as "HA4-related assays": (a) assays to detect HA4 DNA, including Southern blotting, 
genomic PCR™, colony and plaque hybridization, and slot blotting; (b) assays to detect HA4 
RNA, including northern blotting, RT-PCR™, in situ hybridization, primer extension assay, and 
RNase protection assay; (c) assays to detect HA4 polypeptides or proteins, including ELISA, 
Western blotting, immunoprecipitation, radioimmuno-absorption and -competition assays, and 

30 immunofluorescence and immunohistochemical stainings; and (d) assays to search for agents 
that modulate HA4 expression and/or function. Detailed methodologies for these assays will be 
described in the following sections. 
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B. Assays to Examine HA4 Nucleic Acids 

Nucleic acid segments of HA4 or related molecules that exhibit significant homologies 
with, or that contain portions of HA4 will be used as probes to detect members of the HA4 
family of genes. The HA4 family of genes is defined as genes that are detectable with at least 
5 one of these probes. For this purpose, standard assays, including Southern blotting, PCR™, 
colony and plaque hybridization, and slot blot hybridization will be employed tinder various 
conditions with different degrees of stringency as described previously. Specimens to be tested 
include cDNA libraries, genomic DNA, cDNA, and DNA fragments isolated from cells or 
tissues. These assays may be modified to detect selectively mutated HA4 DNA. For this 

10 purpose, Southern blotting or PCR™ will be employed to detect or amplify the mutated DNA 
segments. These segments will then be sequenced to identify the mutated nucleotides. 
Alternatively, a combination of selected restriction enzymes will be employed to reveal 
molecular heterogeneity in Southern blotting. Moreover, these assays may be modified to detect 
selectively different domains or different portions of the HA4 nucleotide sequences. For this 

15 aim, one may employ probes or primers for different portions of the nucleotide sequences. More 
sophisticated methods may be employed to screen point mutations. For example, it is 
contemplated that one may choose a PCR™-single-strand conformation polymorphism (PCR™- 
SSCP) analysis (Sarkar et aL 9 1995). 

Nucleotides of HA4 (SEQ ID NO:l) or related nucleotides that exhibit significant 

20 homologies with, or that contain portions of HA4 will be used as probes to detect transcripts of 
the HA4 family of genes. For this purpose, standard assays, including northern blotting, RT- 
PCR™, in situ hybridization, primer extension assay and RNase protection assay will be 
employed under various conditions with different degrees of stringency as described previously. 
Specimens to be tested include total RNA and mRNA isolated from cells or tissues and cell and 

25 tissue samples themselves obtained from living animals or patients. These assays may be 
modified to detect selectively the transcripts for different domains or different isoforms. For this 
purpose, the inventors will employ probes or primers for different portions of the nucleotide 
sequences. Northern blotting may be used to detect selectively different isoforms. For this 
purpose, oligonucleotide probes will be constructed, each covering different portions of the 

30 nucleotide sequences. To define the nucleotides that are deleted from the original sequence, 
RNase protection assays may be employed. Detection of mutated RNA is also included in the 
present invention. For this aim, RNA isolated from osteoblasts will be analyzed by northern 
blotting or RT-PCR™. 
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It is further contemplated that assays may be designed to detect selectively different RNA 
species. Similar methods using RT-PCR™ may be employed to identify spliced variants and 
even other isoforms that are produced by other mechanisms. Alternatively, Northern blotting 
may be used to detect selectively different isoforms. For this purpose, oligonucleotide probes 
5 will be constructed, each covering different portions of the nucleotide sequences. To define the 
nucleotides that are deleted from the original sequence, RNase protection assays may be 
employed. 

C. Assays to Examine HA4 at Protein or Polypeptide Levels 

10 Antibodies against HA4 will be used to detect HA4 proteins or polypeptides. For this 

purpose, standard assays, including ELISA, western blotting, immunoprecipitation, 
radiohnmuno-absorption and radioimmuno-competition assays, and immunofluorescence and 
immunohistochemical stainings will be employed under various conditions with different 
degrees of specificity and sensitivity. Specimens to be tested include viable cells, whole cellular 

1 5 extracts, and different subcellular fractions of established cell lines, as well as cells, tissues, and 
body fluids isolated from living animals or patients. These assays may be modified to detect 
selectively different epitopes, domains, or isoforms of HA4 polypeptides or proteins. For this 
purpose, the inventors will develop and employ a panel of MAb against different epitopes or 
domains. 

20 

D. Assays to Search for Reagents That Modulate the Activity of HA4 and the 
Expression of HA4 Gene 

Finally, the HA4-related assays described above may also be used to search for 
molecules that modulate HA4-dependent activity, comprising admixing a HA4 expressing cell 

25 with a candidate substance and identifying if the candidate substance inhibits/stimulates the 
expression of HA4. The HA4 expressing cell may be an osteoblast. Alternatively, the HA4 
expressing cell may comprise an engineered cell that expresses recombinant HA4. 

Screening will determine whether the candidate substance affects the expression of HA4. 
For this purpose, cells will be treated with the candidate substance(s) either individually or in 

30 combination and then examined for enhanced HA4 activity at the levels of mRNA, protein, and 
function. Alternatively, the candidate substances may be tested in vivo by administering into live 
animals such as mice. In this case, cells of interest will be isolated from mice after treatment 
with the candidate substance(s) or combinations thereof and examined in vitro for enhanced HA4 
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activity, once again, by measuring the levels of mRNA, protein, and/or function. In performing 
these assays, it will be important to also examine the effect(s) of candidate substances on the 
activity of different isoforms of HA4. In preferred embodiments, agents that enhance or 
stimulate HA4 expression will be formulated in a pharmaceutical acceptable medium. 

A candidate substance(s) that inhibits the activity of HA4 within osteoblasts may be 
identified by inhibition of osteoblast differentiation or bone formation. The invention thus, 
provides agents that inhibit HA4-mediated activation of osteoblasts. In preferred embodiments, 
the agent of the invention will be formulated in a pharmaceutical acceptable medium. 

In further embodiments, the present invention concerns a method for identifying new 
osteoblast interaction inhibitory/stimulatory compounds, which may be termed as "candidate 
substances." It is contemplated that this screening technique will prove useful in the general 
identification of a compound that will serve the puipose of inhibiting/stimulating osteoblast 
activation. Stimulators of osteoblast activation have therapeutic applications in diseases such as 
osteoporosis, bone reconstructions in bone fracture repair etc. 

It is further contemplated that useful compounds in this regard will in no way be limited 
to antibodies. In fact, it may prove to be the case that the most useful pharmacological 
compounds for identification through application of the screening assay will be non-peptidyl in 
nature and serve to inhibit the osteoblast activation process through a tight binding or other 
chemical interaction. 

Candidate molecules may be examined for their capacities to suppress or to enhance the 
expression of HA4 by osteoblasts at mRNA or protein levels. For this aim, osteoblasts will be 
incubated with test samples and then examined for HA4 expression by northern blotting, RT- 
PCR™, in situ hybridization, primer extension assay and RNase protection assay (at RNA levels) or 
by ELISA, western blotting, immunoprecipitation, radioimmuno-absorption and competition assays, 
and immunofluorescence and immunohistochemical stainings (at protein levels). 

While a candidate substance may be any type of substance that may interact with HA4 to 
enhance its activity and stimulate bone formation, one preferred method for obtaining candidate 
substances will be by utilizing combinatorial chemistry techniques. Such techniques are well 
known to the skilled artisan and include methods as described in VanHijfte et al (1999) and Floyd 
et al (1999), both incorporated herein by reference. 
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E. Transgenic Animals and Cells and HA4 Knockouts 
1. Transgenic Animal and Cells 

Cells, cell lines and animals deficient for the HA4 gene can be generated and utilized, for 
example, as part of the identification of specific modulators such as stimulators or inhibitors of 
5 osteoblast gene expression and activity in addition to the identification assays described above. 
Thus, HA4 deficient cells, cell lines and animals will frequently be used herein as a 
representative example. 

The term "HA4-deficient," as used herein, refers to cells, cell lines and/or animals which 
exhibit a lower level of functional HA4 activity than corresponding cells, or cell lines or animals 
10 whose cells, contain two normal, wild type copies of the HA4 gene. A representative HA4- 
deficient, or "knockout" animal is a mouse HA4-deficient animal. Knockout animals are well 
known to those of skill in the art. See, for example, Horinouchi et al. (1995); and Otterbach and 
Stoffel (1995), both of which are incorporated herein by reference in their entirety. Techniques 
for generating additional HA4 knockout cells, cell lines and animals are described below. Cells 
15 that are heterozygous and homozygous for knock-outs are contemplated. 

Cells and cell lines deficient in HA4 activity can be derived from HA4 knockout animals, 
utilizing standard techniques well known to those of skill in the art. Such animals may be used to 
derive a cell line which may be used as an assay substrate in culture. While primary cultures 
may be utilized, the generation of continuous cell lines is preferred. For examples of techniques 
20 which may be used to derive a continuous cell line from the transgenic animals, see Small et al, 
1985. Such techniques for generating cells and cell lines can also be utilized in the context of the 
transgenic and genetically engineered animals described below. 

With respect to HA4 deficient cells, such cells can, for example, include cells taken from 
and cell lines derived from patients exhibiting bone disorders, such as osteoporosis. Additional 
25 HA4-deficient cells and cell lines can be generated using well known recombinant DNA 
techniques such as, for example, site-directed mutagenesis, to introduce mutations into HA4 
gene sequences which will disrupt HA4 activity. 

HA4-deficient cells and animals can be generated using the HA4 nucleotide sequences 
described in the present invention. Such animals can be any species, including but not limited to 
30 mice, rats, rabbits, guinea pigs, pigs, micro-pigs, and non-human primates, e.g., baboons, squirrel 
monkeys and chimpanzees. 

Any technique known in the art may be used to introduce a transgene, such as an 
inactivating gene sequence, into animals to produce the founder lines of transgenic animals. 
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Such techniques include, but are not limited to pronuclear microinjection (U.S. Patent 
4,873,191); retrovirus mediated gene transfer into germ lines (Van der Putten et aL, 1985); gene 
targeting in embryonic stem cells (Thompson et aL, 1989); electroporation of embryos (Lo, 
1983); and sperm-mediated gene transfer (Lavitrano et aL, 1989). For a review of such 
5 techniques, see Gordon, 1989, which is incorporated by reference herein in its entirety. 

As listed above, standard embryonal stem cell (ES) techniques can, for example, be 
utilized for generation of HA4 knockouts. ES cells can be obtained from preimplantation 
embryos cultured in vitro (see, e.g., Evans et aL, 1981; Bradley et aL, 1984; Gossler et aL, 1986; 
Robertson et aL, 1986; Wood et aL, 1993) The introduced ES cells thereafter colonize the 

10 embryo and contribute to the germ line of a resulting chimeric animal (Jaenisch, 1988). 

To accomplish HA4 gene disruptions, the technique of site-directed inactivation via gene 
targeting may be used (Thomas and Capecchi, 1987; reviewed in Frohman et aL, 1989; 
Cappecchi, 1989; Barribault et aL, 1989; Wagner, 1990; and Bradley et aL, 1992). 

Further, standard techniques such as, for example, homologous recombination, coupled 

15 with HA4 sequences, can be utilized to inactivate or alter any HA4 genetic region desired. A 
number of strategies can be utilized to detect or select rate homologous recombinants. For 
example, PCR can be used to screen pools of transformant cells for homologous insertion, 
followed by screening of individual clones (Kim et aL, 1988; Kim et aL, 1991). Alternatively, a 
positive genetic selection approach can be taken in which a marker gene is constructed which 

20 will only be active if homologous insertion occurs, allowing these recombinants to be selected 
directly (Sedivy et aL, 1989). Additionally, the positive-negative approach (PNS) method can be 
utilized (Mansour et aL, 1988; Capecchi, 1989; Capecchi, 1989). Utilizing the PNS method, 
nonhomologous recombinants are selected against by using the Herpes Simplex virus thymidine 
kinase (HSV-TK) gene and selecting against its nonhomologous insertion with herpes drugs such 

25 as ganciclovir or FIAU. By such counter-selection, the number of homologous recombinants in 
the surviving transformants is increased. 

ES cells generated via techniques such as these, when introduced into the germline of a 
nonhuman animal make possible the generation of non-mosaic, i.e., non-chimeric progeny. Such 
progeny will be referred to herein as founder animals. Once the founder animals are produced, 

30 they may be bred, inbred, outbred, or crossbred to produce colonies of the particular animal. 

Taking as an example of the above, the generation of a HA4 knockout mouse, first, 
standard techniques can be utilized to isolate mouse HA4 genomic sequences. Such sequences 



-45- 



WO 2004/041205 



PCT/US2003/035139 



can be routinely isolated by utilizing standard molecular techniques and human HA4 nucleotide 
sequences as probes and/or as PGR primers, as discussed below. 

An inactive allele of the HA4 gene can then be generated by targeted mutagenesis using 
standard procedures of combined positive and negative selection for homologous recombination 
5 in embryonic stem (ES) cells. HA4 genomic clones can be isolated, for example, from a 129/sv 
mouse genomic library, which is isogenic with the ES cells to be used for gene targeting. The 
null targeting vector can be constructed containing homologous sequences flanking both 5' and 3' 
sides of a deletion. The vector carries a resistance marker, e.g., a neomycin resistance marker 
(Neo) for positive selection and a negative marker, e.g., a thymidine kinase (TK) marker, for 

10 negative selection. 

Briefly, vector DNA can be electroporated into W9.5 ES cells (male-derived), which can 
then be cultured and selected on feeder layers of mouse embryonic fibroblasts derived from 
transgenic mice expressing a Neo gene. G418 (350 mg/ml; for gain of Neo) and ganciclovir (2 
mM; for loss of TK) can be added to the culture medium to select for resistant ES cell colonies 

15 that have undergone homologous recombination at the URO-D gene. Recombinants are 
identified by screening genomic DNA from ES cell colonies by Southern blot hybridization 
analysis. Correctly targeted ES cell clones, which also carry a normal complement of 40 
chromosomes, can be used to derive mice carrying the mutation. ES cells can be micro-injected 
into blastocysts at 3.5 days post-coitum obtained from C57BL/6J mice, and blastocysts will be 

20 re-implanted into pseudopregnant female mice, which serve as foster mothers. Chimeric 
progeny derived largely from the ES cells will be identified by a high proportion of agouti coat 
color (the color of the 129/sv strain of origin of the ES cells) against the black coat color derived 
from the C57BL/6J host blastocyst. Male chimeric progeny will be tested for germline 
transmission of the mutation by breeding with C57BL/6J females. Agouti progeny derived from 

25 these crosses will be expected to be heterozygous for the mutation, which will be confirmed by 
Southern blot analysis. These Fl heterozygous progeny will be inter-bred to generate F2 litters 
containing progeny of all three genotypes (wild-type, heterozygous and homozygous mutants) 
for phenotypic analyses. 

30 2. Methods of Making Transgenic Animals 

Thus, a particular embodiment of the present invention provides transgenic animals 
which are knockouts for the HA4 gene and thus serve as models for bone disorders involving 
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HA4 and also provides an assay system for identification of modulators which includes both 
inhibitors and stimulators of HA4 gene expression as well as HA4 functional activity. 

Although the present discussion refers to transgenic mice, it is understood that mice are 
merely exemplary model animal, and any other mammalian animal routinely used as model 
animal (e.g., rat, guinea pig, rabbit, cats, dogs, pigs and the like) may be generated using the 
technology described herein. In a general aspect, a transgenic animal is produced by the 
integration of a given transgene into the genome in a manner that permits the expression of the 
transgene. The terms "animal" and "non-human animal/' as used herein, include all vertebrate 
animals, except humans. It also includes individual animals in all stages of development, 
including embryonic and fetal stages. A "transgenic animal" is any animal containing one or 
more cells bearing genetic information received, directly or indirectly, by deliberate genetic 
manipulation at the subcellular level. The genetic manipulation can be performed by any method 
of introducing genetic material to a cell, including, but not limited to, microinjection, infection 
with a recombinant virus, particle bombardment or electroporation. The term is not intended to 
encompass classical cross-breeding or in vitro fertilization, but rather is meant to encompass 
animals in which one or more cells receive a recombinant DNA molecule. This molecule may 
be integrated within a chromosome, or it may be extrachromosomally replicating DNA. The 
genetic information may be foreign to the species of animal to which the recipient belongs, 
foreign only to the individual recipient, or genetic information already possessed by the recipient 
expressed at a different level, a different time, or in a different location than the native gene. 

Methods for producing transgenic animals are generally described by Wagner and Hoppe 
(U.S. Patent 4,873,191; which is incorporated herein by reference), Brinster et al (1985); which 
is incorporated herein by reference in its entirety) and in "Manipulating the Mouse Embryo; A 
Laboratory Manual" 2nd edition (eds., Hogan, Beddington, Costantimi and Long, Cold Spring 
Harbor Laboratory Press, 1994; which is incorporated herein by reference in its entirety). 

Typically, a gene flanked by genomic sequences is transferred by microinjection into a 
fertilized egg. The microinjected eggs are implanted into a host female, and the progeny are 
screened for the expression of the transgene. Transgenic animals may be produced from the 
fertilized eggs from a number of animals including, but not limited to reptiles, amphibians, birds, 
mammals, and fish. Within a particularly preferred embodiment, transgenic mice are generated 
which are knockouts of HA4. 

DNA clones for microinjection can be prepared by any means known in the art. For 
example, DNA clones for microinjection can be cleaved with enzymes appropriate for removing 
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the bacterial plasmid sequences, and the DNA fragments electrophoresed on 1% agarose gels in 
TBE buffer, using standard techniques. The DNA bands are visualized by staining with 
ethidium bromide, and the band containing the expression sequences is excised. The excised 
band is then placed in dialysis bags containing 0.3 M sodium acetate, pH 7.0. DNA is 
5 electroeluted into the dialysis bags, extracted with a 1:1 phenol :chloroform solution and 
precipitated by two volumes of ethanol. The DNA is redissolved in 1 ml of low salt buffer (0.2 
M NaCl, 20 mM Tris, pH 7.4, and 1 mM EDTA) and purified on an Elutip-D™ column. The 
column is first primed with 3 ml of high salt buffer (1 M NaCl, 20 mM Tris, pH 7.4, and 1 mM 
EDTA) followed by washing with 5 ml of low salt buffer. The DNA solutions are passed 

10 through the column three times to bind DNA to the column matrix. After one wash with 3 ml of 
low salt buffer, the DNA is eluted with 0.4 ml high salt buffer and precipitated by two volumes 
of ethanol. DNA concentrations are measured by absorption at 260 nm in a UV 
spectrophotometer. For microinjection, DNA concentrations are adjusted to 3 ng/ml in 5 mM 
Tris, pH 7.4 and 0.1 mM EDTA. 

15 Other methods for purification of DNA for microinjection are described in Hogan et ah 

(1986), in Palmiter et ah (1982); in Tfte Qiagenologist, Application Protocols, 3rd edition, 
published by Qiagen, Inc., Chatsworth, CA.; and in Sambrook et a/.(2001). 

Female mice are induced to superovulate, e.g., by using an injection of pregnant mare 
serum gonadotropin (PMSG; Sigma) followed, 48 hours later, by an injection of human 

20 chorionic gonadotropin (hCG; Sigma). Females are placed with males immediately after hCG 
injection. Twenty-one hours after hCG injection, the mated females are sacrificed by CO2 
asphyxiation or cervical dislocation and embryos are recovered from excised oviducts and placed 
in Dulbecco's phosphate buffered saline with 0.5% bovine serum albumin (BSA; Sigma). 
Surrounding cumulus cells are removed with hyaluronidase (1 mg/ml). Pronuclear embryos are 

25 then washed and placed in Earle's balanced salt solution containing 0.5 % BSA (EBSS) in a 
37.5°C incubator with a humidified atmosphere at 5% C0 2 , 95% air until the time of injection. 
Embryos can be implanted at the two-cell stage. 

Twenty-five \xg of a S all-linearized SGC targeting vector is electroporated into 1 x 10 7 
embryonic stem (ES) cells. After a suitable period of incubation, e.g., 36 hr, the transfected cells 

30 are then selected using G418 and FIAU. The G418-FIAU-resistant ES colonies are picked into 
96-well plates (Ramirez-Solis et ah, 1993). Positive ES clones are injected into C57BL/6 
blastocysts and transferred into pseudopregnant ICR female recipients. At the time of embryo 
transfer, the recipient females are anesthetized with an intraperitoneal injection of 0.015 ml of 
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2.5% avertin per gram of body weight. The oviducts are exposed by a single midline dorsal 
incision. An incision is. then made through the body wall directly over the oviduct. The ovarian 
bursa is then torn with watchmakers forceps. Embryos to be transferred are placed in DPBS 
(Dulbecco's phosphate buffered saline) and in the tip of a transfer pipet (about 10 to 12 
5 embryos). The pipet tip is inserted into the infundibulum and the embryos transferred. After the 
transfer, the incision is closed by two sutures. 

The resulting male chimeras are bred with C57BL/6 females. Germline transmission can 
be screened by using a phenotype, such as coat color and confirmed by Southern analysis. 

As noted above, transgenic animals and cell lines derived from such animals may find 
10 use in certain testing experiments. In this regard, HA4 transgenic animals and cell lines may be 
exposed to test substances. These test substances can be screened for the ability to induce 
differentiation of cells to osteoblasts. Compounds identified by such procedures will be useful in 
the treatment of bone disorders such as osteoporosis. Thus the compounds identified may be 
used to prevent, treat, ameliorate bone loss. 

15 

(i) ES Cells 

ES cells are obtained from pre-implantation embryos cultured in vitro (Evans et al, 1981; 
Bradley et al, 1984; Gossler a/., 19S6; Robertson et aL 1986). Transgenes are introduced 
into ES cells using a number of means well known to those of skill in the art. The transformed 

20 ES cells can thereafter be combined with blastocysts from a non-human animal. The ES cells 
thereafter colonize the embryo and contribute to the germ line of the resulting chimeric animal 
(for a review see Jaenisch, 1988). 

Once the DNA is introduced, e.g., by electroporation (Quillet et al, 1988; Machy et al, 
1988), the cells are cultured under conventional conditions well known in the art. In order to 

25 facilitate the recovery of those cells which have received the DNA molecule containing the 
desired gene sequence, it is preferable to introduce the DNA containing the desired gene 
sequence in combination with a second gene sequence which would contain a detectable marker 
gene sequence. For the purposes of the present invention, any gene sequence whose presence in 
a cell permits one to recognize and clonally isolate the cell may be employed as a detectable 

30 (selectable) marker gene sequence. The presence of the detectable (selectable) marker sequence 
in a recipient cell may be recognized by PCR, by detection of radiolabeled nucleotides, or by 
other assays of detection which do not require the expression of the detectable marker sequence. 
Typically, the detectable marker gene sequence will be expressed in the recipient cell, and will 
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result in a selectable phenotype. Selectable markers are well known to those of skill in the art. 
Some examples include the hprt gene, the neo gene, the tk (thyroidinc kinase) gene of herpes 
simplex virus (Giphart-Gassler et ah, 1989), or other genes which confer resistance to amino 
acid or nucleoside analogues, or antibiotics, etc. 
5 Any ES cell may be used in accordance with the present invention. It is, however, 

preferred to use primary isolates of ES cells. Such isolates may be obtained directly from 
embryos such as the CCE cell line, or from the clonal isolation of ES cells from the CCE cell 
line (Schwartzberg et ah, 1989). The purpose of such clonal propagation is to obtain ES cells 
which have a greater efficiency for differentiating into an animal. Clonally selected ES cells are 
10 approximately 10-fold more effective in producing transgenic animals than the progenitor cell 
line CCE. 

(ii) Homologous recombination 

Homologous recombination (Koller and Smithies, 1992), directs the insertion of the 

15 transgene to a specific location. This technique allows the precise modification of existing 
genes, and overcomes the problems of positional effects and insertional inactivation observed 
with transgenic animals generated by pronuclear injection or use of viral vectors. Additionally, 
it allows the inactivation of specific genes as well as the replacement of one gene for another. In 
particular embodiments, the DNA segment comprises two selected DNA regions that flank the 

20 HA4 coding region, thereby directing the homologous recombination of the coding region into 
the genomic DNA of a non-human animal species. 

Thus, a preferred method for the delivery of transgenic constructs involves the use of 
homologous recombination. Homologous recombination relies, like antisense, on the tendency 
of nucleic acids to base pair with complementary sequences. In this instance, the base pairing 

25 serves to facilitate the interaction of two separate nucleic acid molecules so that strand breakage 
and repair can take place. In other words, the "homologous" aspect of the method relies on 
sequence homology to bring two complementary sequences into close proximity, while the 
"recombination" aspect provides for one complementary sequence to replace the other by virtue 
of the breaking of certain bonds and the formation of others. 

30 Put into practice, homologous recombination is used as follows. First, the target gene is 

selected within the host cell. Sequences homologous to the target gene are then included in a 
genetic construct, along with some mutation that will render the target gene inactive (stop codon, 
interruption, and the like). The homologous sequences flanking the inactivating mutation are 
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said to "flank" the mutation. Flanking, in this context, simply means that target homologous 
sequences are located both upstream (5') and downstream (3') of the mutation. These sequences 
should correspond to some sequences upstream and downstream of the target gene. The 
construct is then introduced into the cell, thus permitting recombination between the cellular 
5 sequences and the construct. 

As a practical matter, the genetic construct will normally act as far more than a vehicle to 
interrupt the gene. For example, it is important to be able to select for recombinants and, 
therefore, it is common to include within the construct a selectable marker gene. This gene 
permits selection of cells that have integrated the construct into their genomic DNA by 
10 conferring resistance to various biostatic and biocidal drugs. In addition, a heterologous gene 
that is to be expressed in the cell also may advantageously be included within the construct. The 
arrangement might be as follows: 

...vector* 5 '-flanking sequence^heterologous gene* selectable marker 
15 gene*flanking sequence-3' # vector... 

Thus, using this kind of construct, it is possible, in a single recombinatorial event, to (i) "knock 
out" an endogenous gene, (ii) provide a selectable marker for identifying such an event and (iii) 
introduce a transgene for expression. 

20 Another refinement of the homologous recombination approach involves the use of a 

"negative" selectable marker. This marker, unlike the selectable marker, causes death of cells 
which express the marker. Thus, it is used to identify undesirable recombination events. When 
seeking to select homologous recombinants using a selectable marker, it is difficult in the initial 
screening step to identify proper homologous recombinants from recombinants generated from 

25 random, non-sequence specific events. These recombinants also may contain the selectable 
marker gene and may express the heterologous protein of interest, but will, in all likelihood, not 
have the desired "knock out" phenotype. By attaching a negative selectable marker to the 
construct, but outside of the flanking regions, one can select against many random recombination 
events that will incorporate the negative selectable marker. Homologous recombination should 

30 not introduce the negative selectable marker, as it is outside of the flanking sequences. 
Examples of processes that use negative selection to enrich for homologous recombination 
include the disruption of targeted genes in embryonic stem cells or transformed cell lines 
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(Mortensen, 1993; Willnow and Herz, 1994) and the production of recombinant virus such as 
adenovirus (Imler et ai, 1995). 

Since the frequency of gene targeting is heavily influenced by the origin of the DNA 
being used for targeting, it is beneficial to obtain DNA that is as similar (isogenic) to the cells 
5 being targeted as possible. One way to accomplish this is by isolation of the region of interest 
from genomic DNA from a single colony by long range PCR. Using long range PCR it is 
possible to isolate fragments of 7-12 kb from small amounts of starting DNA. 

Gene trapping is a useful technique suitable for use with the present invention. This 
refers to the utilization of the endogenous regulatory regions present in the chromosomal DNA 
10 to activate the incoming transgene. In this way expression of the trans gene is absent or 
minimized when the transgene inserts in a random location. However, when homologous 
recombination occurs the endogenous regulatory region are placed in apposition to the incoming 
transgene, which results in expression of the transgene. 

15 (iii) Site Specific Recombination 

Members of the integrase family are proteins that bind to a DNA recognition sequence, 
and are involved in DNA recognition, synapsis, cleavage, strand exchange, and religation. 
Currently, the family of integrases includes 28 proteins from bacteria, phage, and yeast which 
have a common invariant His-Arg-Tyr triad (Abremski and Hoess, 1992). Four of the most 

20 widely used site-specific recombination systems for eukaryotic applications include: Cre-loxP 
from bacteriophage PI (Austin etal 9 1981); FLP-FRT from the 2\i plasmid of Saccharomyces 
cerevisiae (Andrews et al. 9 1986); R-RS from Zygosaccharomyces ronxii (Maeser and Kallmann, 
1991) and gin-gix from bacteriophage Mu (Onouchi et al. 9 1995). The Cre-loxP and FLP-FRT 
systems have been developed to a greater extent than the latter two systems. The R-RS system, 

25 like the Cre-loxP and FLP-FRT systems, requires only the protein and its recognition site. The 
Gin recombinase selectively mediates DNA inversion between two inversely oriented 
recombination sites (gix) and requires the assistance of three additional factors: negative 
supercoiling, an enhancer sequence and its binding protein Fis. 

The present invention contemplates the use of the Cre/Lox site-specific recombination 

30 system (Sauer, 1993; Gibco/BRL, Inc., Gaithersburg, Md.) to rescue specific genes out of a 
genome, and to excise specific transgenic constructs from the genome. The Cre (causes 
recombination)-lox P (locus of crossing-over(x)) recombination system, isolated from 
bacteriophage PI, requires only the Cre enzyme and its loxP recognition site on both partner 
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molecules (Sternberg and Hamilton, 1981). The loxP site consists of two symmetrical 13 bp 
protein binding regions separated by an 8 bp spacer region, which is recognized by the Cre 
recombinase, a 35 kDa protein. Nucleic acid sequences for loxP (Hoess etal, 1982) and Cre 
(Sternberg etal, 1986) are known. If the two lox P sites are cis to each other, an excision 
reaction occurs; however, if the two sites are trans to one another, an integration event occurs. 
The Cre protein catalyzes a site-specific recombination event. This event is bidirectional, i.e., 
Cre will catalyze the insertion of sequences at a LoxP site or excise sequences that lie between 
two LoxP sites. Thus, if a construct for insertion also has flanking LoxP sites, introduction of the 
Cre protein, or a polynucleotide encoding the Cre protein, into the cell will catalyze the removal 
of the construct DNA. This technology is enabled in U.S. Patent 4,959,317, which is hereby 
incorporated by reference in its entirety. 

An initial in vivo study in bacteria showed that the Cre excises loxP-flanked DNA 
extrachromosomally in cells expressing the recombinase (Abremski etal, 1983). A major 
question regarding this system was whether site-specific recombination in eukaryotes could be 
promoted by a bacterial protein. However, Sauer (1987) showed that the system excises DNA in 
S. cerevisiae with the same level of efficiency as in bacteria. 

Further studies with the Cre-loxP system, in particular the ES cells system in mice, has 
demonstrated the usefulness of the excision reaction for the generation of unique transgenic 
animals. Homologous recombination followed by Cre-mediated deletion of a loxP-flanked 
neo-tk cassette was used to introduce mutations into ES cells. This strategy was repeated for a 
total of 4 rounds in the same line to alter both alleles of the rep-3 and mMsh2 loci, genes 
involved in DNA mismatch repair (Abuin and Bradley, 1996). Similarly, a transgene which 
consists of the 35S promoter/luciferase gene/loxP/35S promoter/hpt gene/loxP (luclhyg*) was 
introduced into tobacco. Subsequent treatment with Cre causes the deletion of the hyg gene 
(luc + hyg s ) at 50% efficiency (Dale and Ow, 1991). Transgenic mice which have the Ig light 
chain k constant region targeted with a loxP-flanked neo gene were bred to Cre-producing mice 
to remove the selectable marker from the early embryo (Lakso etaL, 1996). This general 
approach for removal of markers stems from issues raised by regulatory groups and consumers 
concerned about the introduction of new genes into a population. 

An analogous system contemplated for use in the present invention is the FLP/FRT 
system. This system was used to target the histone 4 gene in mouse ES cells with a FRT-flanked 
neo cassette followed by deletion of the marker by FLP-mediated recombination. The FLP 
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protein could be obtained from an inducible promoter driving the FLP or by using the protein 
itself (Wigleye/ a/., 1994). 

The present invention also contemplates the use of recombination activating genes 
(RAG) 1 and 2 to excise specific transgenic constructs from the genome, as well as to rescue 
5 specific genes from the genome. RAG-1 (GenBank accession number M29475) and RAG-2 
(GenBank accession numbers M64796 and M33828) recognize specific recombination signal 
sequences (RSSs) and catalyze V(D)J recombination required for the assembly of 
immunoglobulin and T cell receptor genes (Schatz etal, 1989; Oettinger etal, 1990; Cuomo 
and Oettinger, 1994), Transgenic expression of RAG-1 and RAG-2 proteins in non-lymphoid 
10 cells supports V(D)J recombination of reporter substrates (Oettinger et aL, 1990). For use in the 
present invention, the transforming construct of interest is engineered to contain flanking RSSs. 
Following transformation, the transforming construct that is internal to the RSSs can be deleted 
from the genome by the transient expression of RAG-1 and RAG-2 in the transformed cell. 

1 5 V. Clinical Application of HA4-Related Reagents 

It is further contemplated that the HA4 related agents described herein, i.e., HA4 proteins 
or polypeptides, antibodies raised against such proteins or polypeptides, mutated, truncated or 
elongated forms of HA4, antibodies raised against such forms, cells engineered to overproduce 
or lack HA4, proteins that interact with HA4, and agents that stimulate, activate, inhibit or 
20 modulate HA4 gene expression may be used to promote or inhibit bone formation. That is, they 
may be used for the treatment of bone disorders, such as osteoporosis, glucocorticoid induced 
osteoporosis, Paget's disease, abnormally increased bone turnover, periodontal disease, tooth 
loss, bone fractures, rheumatoid arthritis, periprosthetic osteolysis, osteogenesis imperfecta, 
metastatic bone disease, hypercalcemia of malignancy and the like. 

25 

A. Screens for Reagents that Modulate HA4 Expression and Function 

One may determine whether candidate substances may affect the expression of HA4 by 
osteoblasts. Cells will be treated with candidate substances either individually or in combination 
and then examined for HA4 expression at the levels of mRNA, protein, and function. 
30 Alternatively, those candidate substances may be tested in vivo by administration to living 
animals. In one example, osteoblasts will be isolated from those mice after treatment and then 
examined in vitro for HA4 expression, once again, at the levels of mRNA, protein, and function. 
In performing these assays, it will be important to also examine the effect(s) of candidate 
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substances on the expression of different isoforms of HA4. In another embodiment, 
experimental animals will be assessed for in vivo alterations in bone conditions. 

Thus, in one embodiment, the present invention is directed to a method for determining 
the ability of a candidate substance to stimulate the osteoblast activation process, the method 
5 including generally the steps of: 



(a) providing a composition comprising a population of cells expressing HA4; 

(b) incubating the composition with a candidate substance; 

(c) assessing HA4 expression or function; and 

10 (d) identifying a candidate substance that modulates HA4 expression or function. 



Naturally, one would measure or determine HA4 expression/fucntion composition in the absence 
of the added candidate substance as a control. A candidate substance which increases the 
osteoblast development or HA4 expression relative to the activity/expression in its absence is 

15 indicative of a candidate substance with stimulatory capability. 

It will, of course, be understood that all the screening methods of the present invention 
are useful in themselves notwithstanding the fact that effective candidates may not be found, 
since it would be a practical utility to know that HA4 agonists and/or antagonists do not exist. 
The invention provides methods for screening for such candidates, not in finding them. 

20 Candidate molecules may augment HA4 action without actually affecting HA4 

expression or function directly. To test this possibility, test samples will include a suitable cell, 
HA4 polypeptide or nucleic acids, and a candidate substance. Read out for the assay will be as 
discussed above. 

Any molecule can be a candidate molecule for the purposes of the present invention, for 
25 example, from a variety of natural sources. It is envisioned that candidate molecules will be 
designed and created most effectively using well known combinatorial chemistry techniques, 
such as those described in VanHijfle et al (1999) and Floyd et al (1999), incorporated herein by 
reference. 

30 B. Therapies Using HA4 

As HLA4 is involved in bone formation, it may be effectively used for the treatment of 
bone disorders, such as osteoporosis, glucocorticoid induced osteoporosis, Paget's disease, 
abnormally increased bone turnover, periodontal disease, tooth loss, bone fractures, rheumatoid 
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arthritis, periprosthetic osteolysis, osteogenesis imperfecta, metastatic bone disease, 
hypercalcemia of malignancy and the like. 

1. Protein Therapy of HA4 

5 A therapy approach is the provision, to a subject, of HA4 polypeptide, active fragments, 

synthetic peptides, mimetics or other analogs thereof. The protein may be produced by 
recombinant expression means or, for smaller peptides, generated by a peptide synthesizer. 
Formulations would be selected based on the route of administration and puipose, including but 
not limited to liposomal formulations and classic pharmaceutical preparations. 

10 

2. Genetic-Based Therapies with HA4 

One of the therapeutic embodiments contemplated by the present inventors is the 
intervention, at the molecular level, in the events involved in the bone formation. Specifically, 
the present inventors intend to provide, to a bone cell or a precursor cell, an expression construct 

15 capable of providing a HA4 polypeptide to that cell. Because the sequence homology between 
the human and mouse genes, either of these nucleic acids could be used in human therapy, as 
could any of the gene sequence variants which would encode the same, or a biologically 
equivalent polypeptide. The lengthy discussion above of expression vectors and the genetic 
elements employed therein is incorporated into this section by reference. Particularly preferred 

20 expression vectors are viral vectors, discussed elsewhere in this document. 

Those of skill in the art are well aware of how to apply gene delivery to in vivo and ex 
vivo situations. For viral vectors, one generally will prepare a viral vector stock. Depending on 
the kind of virus and the titer attainable, one will deliver 1 to 100, 10 to 50, 100-1000, or up to 1 
x 10 4 , 1 x 10 5 , 1 x 10 6 , 1 x 10 7 , 1 x 10 8 , 1 x 10 9 , 1 x 10 10 , 1 x 10 n , or 1 x 10 12 infectious particles 

25 to the patient. Similar figures may be extrapolated for liposomal or other non-viral formulations 
by comparing relative uptake efficiencies. Formulation as a pharmaceutical^ acceptable 
composition is discussed below. 

Various routes are contemplated for different disease types. The section below on routes 
contains an extensive list of possible routes. In a different embodiment, ex vivo gene therapy is 

30 contemplated. In an ex vivo embodiment, cells from the patient are removed and maintained 
outside the body for at least some period of time. During this period, a HA4 gene is delivered to 
these cells, after which the cells are reintroduced into the patient. 
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In some embodiments of the present invention a subject is exposed to a viral vector and 
the subject is then monitored for expression construct-based toxicity, where such toxicity may 
include, among other things, causing a condition that is injurious to the subject. 

5 3. Pharmaceutical Formulations and Delivery 

In a preferred embodiment of the present invention, a method of treatment for a bone 
disorder by the delivery of an expression construct encoding a HA4 polypeptide is contemplated. 
Bone disorders, such as osteoporosis, glucocorticoid induced osteoporosis, Paget's disease, 
abnormally increased bone turnover, periodontal disease, tooth loss, bone fractures, rheumatoid 

10 arthritis, periprosthetic osteolysis, osteogenesis imperfecta, metastatic bone disease, 
hypercalcemia of malignancy and the like may be treated. 

An effective amount of the pharmaceutical composition, generally, is defined as that 
amount sufficient to detectably and repeatedly to ameliorate, reduce, minimize or limit the extent 
of the disease or its symptoms. More rigorous definitions may apply, including elimination, 

15 eradication or cure of disease. 

The therapeutic expression construct expressing an HA4 polypeptide may be 
administered by any of the routes and the route of administration will vary, naturally, with the 
location and nature of the lesion, and include, e.g., intradermal, transdermal, parenteral, 
intravenous, intramuscular, intranasal, subcutaneous, percutaneous, intratracheal, intraperitoneal, 

20 intratumoral, perfusion, lavage, direct injection, and oral administration and formulation. 
Treatment regimens may vary as well, and often depend on disease progression, and health and 
age of the patient. The clinician will be best suited to make such decisions based on the known 
efficacy and toxicity (if any) of the therapeutic formulations. 

The treatments may include various "unit doses. 55 Unit dose is defined as containing a 

25 predetermined-quantity of the therapeutic composition. The quantity to be administered, and the 
particular route and formulation, are within the skill of those in the clinical arts. A unit dose 
need not be administered as a single injection but may comprise continuous infusion over a set 
period of time. Unit dose of the present invention may conveniently be described in terms of 
plaque forming units (pfu) for a viral construct. Unit doses range from 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 

30 10 8 ? 10 9 3 10 10 , 10 n , 10 12 , 10 13 pfu and higher. Alternatively, depending on the kind of virus and 
the titer attainable, one will deliver 1 to 100, 10 to 50, 100-1000, or up to about 1 x 10 4 , 1 x 10 5 , 
1 x 10 6 , 1 x 10 7 , 1 x 10 8 , 1 x 10 9 , 1 x 10 10 , 1 x 10 n , 1 x 10 12 , 1 x 10 13 , 1 x 10 14 , or 1 x 10 15 or 
higher infectious viral particles (vp) to the patient or to the patient's cells. 
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Injection of nucleic acid constructs may be delivered by syringe or any other method 
used for injection of a solution, as long as the expression construct can pass through the 
particular gauge of needle required for injection. A novel needleless injection system has 
recently been described (U.S. Patent 5,846,233) having a nozzle defining an ampule chamber for 
5 holding the solution and an energy device for pushing the solution out of the nozzle to the site of 
delivery. A syringe system has also been described for use in gene therapy that permits multiple 
injections of predetermined quantities of a solution precisely at any depth (U.S. Patent 
5,846,225). 

Solutions of the active compounds as free base or pharmacologically acceptable salts 

10 may be prepared in water suitably mixed with a surfactant, such as hydroxypropylcellulose. 
Dispersions may also be prepared in glycerol, liquid polyethylene glycols, and mixtures thereof 
and in oils. Under ordinary conditions of storage and use, these preparations contain a 
preservative to prevent the growth of microorganisms. The pharmaceutical forms suitable for 
injectable use include sterile aqueous solutions or dispersions and sterile powders for the 

15 extemporaneous preparation of sterile injectable solutions or dispersions (U.S. Patent 5,466,468, 
specifically incorporated herein by reference in its entirety). In all cases the form must be sterile 
and must be fluid to the extent that easy syringability exists. It must be stable under the 
conditions of manufacture and storage and must be preserved against the contaminating action of 
microorganisms, such as bacteria and fungi. The carrier can be a solvent or dispersion medium 

20 containing, for example, water, ethanol, polyol (e.g., glycerol, propylene glycol, and liquid 
polyethylene glycol, and the like), suitable mixtures thereof, and/or vegetable oils. Proper 
fluidity may be maintained, for example, by the use of a coating, such as lecithin, by the 
maintenance of the required particle size in the case of dispersion and by the use of surfactants. 
The prevention of the action of microorganisms can be brought about by various antibacterial 

25 and antifungal agents, for example, parabens, chlorobutanol, phenol, sorbic acid, thimerosal, and 
the like. In many cases, it will be preferable to include isotonic agents, for example, sugars or 
sodium chloride. Prolonged absoiption of the injectable compositions can be brought about by 
the use in the compositions of agents delaying absorption, for example, aluminum monostearate 
and gelatin. 

30 For parenteral administration in an aqueous solution, for example, the solution should be 

suitably buffered if necessary and the liquid diluent first rendered isotonic with sufficient saline 
or glucose. These particular aqueous solutions are especially suitable for intravenous, 
intramuscular, subcutaneous, intratumoral and intraperitoneal administration. In this connection, 
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sterile aqueous media that can be employed will be known to those of skill in the art in light of 
the present disclosure. For example, one dosage may be dissolved in 1 ml of isotonic NaCl 
solution and either added to 1000 ml of hypodermolysis fluid or injected at the proposed site of 
infusion, (see for example, "Remingtons Pharmaceutical Sciences" 15th Edition, pages 1035- 
1038 and 1570-1580). Some variation in dosage will necessarily occur depending on the 
condition of the subject being treated. The person responsible for administration will, in any 
event, determine the appropriate dose for the individual subject. Moreover, for human 
administration, preparations should meet sterility, pyrogenicity, general safety and purity 
standards as required by FDA Office of Biologies standards. 

Sterile injectable solutions are prepared by incorporating the active compounds in the 
required amount in the appropriate solvent with various of the other ingredients enumerated 
above, as required, followed by filtered sterilization. Generally, dispersions are prepared by 
incorporating the various sterilized active ingredients into a sterile vehicle which contains the 
basic dispersion medium and the required other ingredients from those enumerated above. In the 
case of sterile powders for the preparation of sterile injectable solutions, the preferred methods of 
preparation are vaccuum-drying and freeze-drying techniques which yield a powder of the active 
ingredient plus any additional desired ingredient from a previously sterile-filtered solution 
thereof. 

The compositions disclosed herein may be formulated in a neutral or salt form. 
Pharmaceutically-acceptable salts, include the acid addition salts (fomied with the free amino 
groups of the protein) and which are formed with inorganic acids such as, for example, 
hydrochloric or phosphoric acids, or such organic acids as acetic, oxalic, tartaric, mandelic, and 
the like. Salts formed with the free carboxyl groups can also be derived from inorganic bases 
such as, for example, sodium, potassium, ammonium, calcium, or ferric hydroxides, and such 
organic bases as isopropylamine, trimethylamine, histidine, procaine and the like. Upon 
formulation, solutions will be administered in a manner compatible with the dosage formulation 
and in such amount as is therapeutically effective. The formulations are easily administered in a 
variety of dosage forms such as injectable solutions, drug release capsules and the like. 

As used herein, "carrier" includes any and all solvents, dispersion media, vehicles, 
coatings, diluents, antibacterial and antifungal agents, isotonic and absorption delaying agents, 
buffers, carrier solutions, suspensions, colloids, and the like. The use of such media and agents 
for pharmaceutical active substances is well known in the art. Except insofar as any 
conventional media or agent is incompatible with the active ingredient, its use in the therapeutic 
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compositions is contemplated. Supplementary active ingredients can also be incorporated into 
the compositions. 

The phrase "pharmaceutically-acceptable" or "pharmacologically-acceptable" refers to 
molecular entities and compositions that do not produce an allergic or similar untoward reaction 
5 when administered to a human. The preparation of an aqueous composition that contains a 
protein as an active ingredient is well understood in the art. Typically, such compositions are 
prepared as injectables, either as liquid solutions or suspensions; solid forms suitable for solution 
in, or suspension in, liquid prior to injection can also be prepared. The terms "contacted" and 
"exposed," when applied to a cell, are used herein to describe the process by which a therapeutic 
10 construct encoding a HA4 polypeptide is delivered to a target cell. 



C. Diagnostic Applications 

In accordance with the present invention, it will also be useful to examine the structure 
and/or activity of HA4 in cells of a subject. The assays described in the previous section for 
15 examining protein levels, mRNA levels, and DNA structure may be applied to the endeavor of 
examining a clinical sample for defects in HA4. In particular, identification of HA4 in 
circulation would indicate that serum levels could be used as a diagnostic measure of bone 
density. 

Assays to assess the level of expression of a polypeptide are also well known to those of 
20 skill in the art. This can be accomplished also by assaying for HA4 mRNA levels, mRNA 
stability or turnover, as well as protein expression levels. It is further contemplated that any 
post-translational processing of HA4 may also be assessed, as well as whether it is being 
localized or regulated properly. In some cases an antibody that specifically binds HA4 may be 
used. Assays for HA4 activity also may be used. 

25 

1. Northern Blotting Techniques 

The present invention therefore employs Northern blotting in assessing the expression of 
HA4 in a cell such as chrondrogenic cell, osteoblastic cell, or myoblastic cells, but is not limited 
to such. The techniques involved in Northern blotting are commonly used in molecular biology 
30 and are well known to one of skilled in the art. These techniques can be found in many standard 
books on molecular protocols (e.g., Sambrook et aL, 2001). This technique allows for the 
detection of RNA i.e., hybridization with a labeled probe. 
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Briefly, RNA is separated by gel electrophoresis. The gel is then contacted with a 
membrane, such as nitrocellulose, permitting transfer of the nucleic acid and non-covalent 
binding. Subsequently, the membrane is incubated with, e.g., a chromophore-conjugated probe 
that is capable of hybridizing with a target amplification product. Detection is by exposure of 
5 the membrane to x-ray film or ion-emitting detection devices. 

U.S. Patent 5,279,721, incorporated by reference herein, discloses an apparatus and 
method for the automated electrophoresis and transfer of nucleic acids. The apparatus permits 
electrophoresis and blotting without external manipulation of the gel and is ideally suited to 
carrying out methods according to the present invention. 

10 

2. Quantitative RT-PCR 

The present invention also employs quantitative RT-PCR in assessing the expression or 
activity of HA4 in a cell such as chrondrogenic cell, osteoblastic cell, or myoblastic cells, but is 
not limited to such. Reverse transcription (RT) of RNA to cDNA followed by relative 

15 quantitative PCR™ (RT-PCR) can be used to determine the relative concentrations of specific 
mRNA species, such as a HA4 transcript, isolated from a cell. By determining that the 
concentration of a specific mRNA species varies, it is shown that the gene encoding the specific 
mRNA species is differentially expressed 

In PCR™, the number of molecules of the amplified target DNA increase by a factor 

20 approaching two with every cycle of the reaction until some reagent becomes limiting. 
Thereafter, the rate of amplification becomes increasingly diminished until there is not an 
increase in the amplified target between cycles. If one plots a graph on which the cycle number 
is on the X axis and the log of the concentration of the amplified target DNA is on the Y axis, 
one observes that a curved line of characteristic shape is formed by connecting the plotted points. 

25 Beginning with the first cycle, the slope of the line is positive and constant. This is said to be the 
linear portion of the curve. After some reagent becomes limiting, the slope of the line begins to 
decrease and eventually becomes zero. At this point the concentration of the amplified target 
DNA becomes asymptotic to some fixed value. This is said to be the plateau portion of the 
curve. 

30 The concentration of the target DNA in the linear portion of the PCR™ is directly 

proportional to the starting concentration of the target before the PCR™ was begun. By 
determining the concentration of the PCR™ products of the target DNA in PCR™ reactions that 
have completed the same number of cycles and are in their linear ranges, it is possible to 
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determine the relative concentrations of the specific target sequence in the original DNA 
mixture. If the DNA mixtures are cDNAs synthesized from RNAs isolated from different cells, 
the relative abundances of the specific mRNA from which the target sequence was derived can 
be determined for the respective tissues or cells. This direct proportionality between the 
5 concentration of the PCR™ products and the relative mRNA abundances is only true in the 
linear range portion of the PCR™ reaction. 

The final concentration of the target DNA in the plateau portion of the curve is 
detemiined by the availability of reagents in the reaction mix and is independent of the original 
concentration of target DNA. Therefore, the first condition that must be met before the relative 

10 abundances of a mRNA species can be determined by RT-PCR for a collection of RNA 
populations is that the concentrations of the amplified PCR™ products must be sampled when 
the PCR™ reactions are in the linear portion of their curves. 

The second condition that must be met for an RT-PCR study to successfully determine 
the relative abundances of a particular mRNA species is that relative concentrations of the 

15 amplifiable cDNAs must be normalized to some independent standard. The goal of an RT-PCR 
study is to determine the abundance of a particular mRNA species relative to the average 
abundance of all mRNA species in the sample. In such studies, mRNAs for -actin, asparagine 
synthetase and lipocortin II may be used as external and internal standards to which the relative 
abundance of other mRNAs are compared. 

20 Most protocols for competitive PCR™ utilize internal PCR™ internal standards that are 

approximately as abundant as the target. These strategies are effective if the products of the 
PCR™ amplifications are sampled during their linear phases. If the products are sampled when 
the reactions are approaching the plateau phase, then the less abundant product becomes 
relatively over represented. Comparisons of relative abundances made for many different RNA 

25 samples, such as is the case when examining RNA samples for differential expression, become 
distorted in such a way as to make differences in relative abundances of RNAs appear less than 
they actually are. This is not a significant problem if the internal standard is much more 
abundant than the target. If the internal standard is more abundant than the target, then direct 
linear comparisons can be made between RNA samples. 

30 The discussion above describes the theoretical considerations for an RT-PCR assay for 

clinically derived materials. The problems inherent in clinical samples are that they are of 
variable quantity (making normalization problematic), and that they are of variable quality 
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(necessitating the co-amplification of a reliable internal control, preferably of larger size than the 
target). 

Both of the foregoing problems are overcome if the RT-PCR is performed as a relative 
quantitative RT-PCR with an internal standard in which the internal standard is an amplifiable 
5 cDNA fragment that is larger than the target cDNA fragment and in which the abundance of the 
mRNA encoding the internal standard is roughly 5-100 fold higher than the mRNA encoding the 
target. This assay measures relative abundance, not absolute abundance of the respective mRNA 
species. 

Other studies are available that use a more conventional relative quantitative RT-PCR 
10 with an external standard protocol. These assays sample the PCR™ products in the linear 
portion of their amplification curves. The number of PCR™ cycles that are optimal for sampling 
must be empirically determined for each target cDNA fragment. In addition, the reverse 
transcriptase products of each RNA population isolated from the various tissue samples must be 
carefully normalized for equal concentrations of amplifiable cDNAs. This is very important 
15 since this assay measures absolute mRNA abundance. Absolute mRNA abundance can be used 
as a measure of differential gene expression only in normalized samples. While empirical 
determination of the linear range of the amplification curve and normalization of cDNA 
preparations are tedious and time consuming processes, the resulting RT-PCR assays can be 
superior to those derived from the relative quantitative RT-PCR with an internal standard. 
20 One reason for this is that without the internal standard/competitor, all of the reagents 

can be converted into a single PCR™ product in the linear range of the amplification curve, 
increasing the sensitivity of the assay. Another reason is that with only one PCR™ product, 
display of the product on an electrophoretic gel or some other display method becomes less 
complex, has less background and is easier to interpret. 

25 

3. Immunohistochemistry 

The present invention also employs quantitative immunohistochemistry in assessing the 
expression of HA4 in a cell, tissue or organ sample. 

Briefly, frozen-sections may be prepared by rehydrating 50 ng of frozen "pulverized" 
30 tumor at room temperature in phosphate buffered saline (PBS) in small plastic capsules; pelleting 
the particles by centrifugation; resuspending them in a viscous embedding medium (OCT); 
inverting the capsule and pelleting again by centrifugation; snap-freezing in -70°C isopentane; 
cutting the plastic capsule and removing the frozen cylinder of tissue; securing the tissue 
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cylinder on a cryostat microtome chuck; and cutting 25-50 serial sections containing an average 
of about 500 remarkably intact cell, tissue or organ sample. 

Permanent-sections may be prepared by a similar method involving rehydration of the 50 
mg sample in a plastic microfuge tube; pelleting; resuspending in 10% formalin for 4 h fixation; 
5 washing/pelleting; resuspending in warm 2.5% agar; pelleting; cooling in ice water to harden the 
agar; removing the tissue/agar block from the tube; infiltrating and embedding the block in 
paraffin; and cutting up to 50 serial permanent sections. 

Other immunohistochemistry techniques that may be employed in the present invention 
include tissue microarray immunohistochemistry. This method is a recently developed technique 
10 that enables the simultaneous examination of multiple tissues sections concurrently as compared 
to the more conventional technique of one section at a time. This technique is used for high 
throughput molecular profiling of tumor specimen (Kononen et aL, 1998). 

4. Western Blotting 

15 The present invention also employs the use of Western blotting (immunoblotting) 

analysis to assess HA4 activity or expression in a cell such as chrondrogenic cell, osteoblastic 
cell, or myoblastic cells, but is not limited to such. This technique is well known to those of skill 
in the art, see U.S. Patent 4,452,901 incorporated herein by reference and Sambrook et al. 
(2001). In brief, this technique generally comprises separating proteins in a sample such as a cell 

20 or tissue sample by SDS-PAGE gel electrophoresis. In SDS-PAGE proteins are separated on the 
basis of molecular weight, then are transferring to a suitable solid support, (such as a 
nitrocellulose filter, a nylon filter, or derivatized nylon filter), followed by incubation of the 
proteins on the solid support with antibodies that specifically bind to the proteins. 

25 5. ELISA 

The present invention may also employ the use of immunoassays such as an enzyme 
linked immunosorbent assay (ELISA) in assessing the activity or expression of HA4 in a cell 
such as chrondrogenic cell, osteoblastic cell, or myoblastic cells, but is not limited to such. An 
ELISA generally involves the steps of coating, incubating and binding, washing to remove 
30 species that are non-specifically bound, and detecting the bound immune complexes. This 
technique is well known in the art, for example see U.S. Patent 4,367,1 10 and Harlow and Lane, 
1988. 
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In an ELISA assay, a HA4 protein sample may be immobilized onto a selected surface, 
preferably a surface exhibiting a protein affinity such as the wells of a polystyrene microtiter 
plate. After washing to remove incompletely adsorbed material, it is desirable to bind or coat the 
assay plate wells with a nonspecific protein that is known to be antigenically neutral with regard 
5 to the test antisera such as bovine serum albumin (BSA), casein or solutions of milk powder. 
This allows for blocking of nonspecific adsorption sites on the immobilizing surface and thus 
reduces the background caused by nonspecific binding of antisera onto the surface. 

After binding of the antigenic material to the well, coating with a non-reactive material to 
reduce background, and washing to remove unbound material, the immobilizing surface is 

10 contacted with the antisera or clinical or biological extract to be tested in a manner conducive to 
immune complex (antigen/antibody) formation. Such conditions preferably include diluting the 
antisera with diluents such as BSA, bovine gamma globulin (BGG) and phosphate buffered 
saline (PBS)/Tween. These added agents also tend to assist in the reduction of nonspecific 
background. The layered antisera is then allowed to incubate for from 2 to 4 or more hours to 

15 allow effective binding, at temperatures preferably on the order of 25°C to 37°C (or overnight at 
4°C). Following incubation, the antisera-contacted surface is washed so as to remove non- 
immunocomplexed material. A preferred washing procedure includes washing with a solution 
such as PBS/Tween, or borate buffer. 

Following formation of specific immunocomplexes between the test sample and the 

20 bound antigen, and subsequent washing, the occurrence and even amount of immunocomplex 
formation may be determined by subjecting same to a second antibody having specificity for the 
first. To provide a detecting means, the second antibody preferably has an associated enzyme 
that generates a color development upon incubating with an appropriate chromogenic substrate. 
Thus, for example, one will desire to contact and incubate the antisera-bound surface with a 

25 urease or peroxidase-conjugated anti-human IgG for a period of time and under conditions which 
favor the development of immunocomplex formation (e.g., incubation for 2 hours at room 
temperature in a PBS-containing solution such as PBS-Tween). 

After incubation with the second enzyme-tagged antibody, and subsequent to washing to 
remove unbound material, the amount of label is quantified by incubation with a chromogenic 

30 substrate such as urea and bromocresol purple or 2,2'-azino-di-(3-ethyl-benzthiazoline-6-sulfonic 
acid (ABTS) and H2O2, in the case of peroxidase as the enzyme label. Quantification is then 
achieved by measuring the degree of color generation, e.g., using a visible spectra 
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spectrophotometer. The use of labels for immunoassays are described in U.S. Patents 5,310,687, 
5,238,808 and 5,221,605. 

Other immunodetection methods that may be contemplated in the present invention 
include radioimmunoassay (RIA), immunoradiometric assay, fluoroimmunoassay, 
5 chemiluminescent assay, bioluminescent assay. These methods are well known to those of 
ordinary skill and have been described in Doolittle et ah (1999); Gulbis et ah (1993); De Jager et 
ah (1993); and Nakamura et ah (1987), each incorporated herein by reference. 

VI. Examples 

10 The following examples are included to demonstrate preferred embodiments of the 

invention. It should be appreciated by those of skill in the art that the techniques disclosed in the 
examples which follow represent techniques discovered by the inventor to function well in the 
practice of the invention, and thus can be considered to constitute preferred modes for its 
practice. However, those of skill in the art should, in light of the present disclosure, appreciate 

15 that many changes can be made in the specific embodiments which are disclosed and still obtain 
a like or similar result without departing from the spirit and scope of the invention. 



EXAMPLE 1 
Expression of HA4 in Bone 

20 The HA4 gene was isolated from a mouse genomic X-ZAP library by using HA4 cDNA 

as a probe. The HA4 cDNA was obtained by a subtraction screening of BMP-untreated and 
BMP-treated chondro genie ATDC5 cells, using the Clonetech Subtraction-Suppression Kit. 
PCR-Amplified cDNAs from ATDC5 cells were subtracted from cDNAs isolated from BMP- 
treated ATDC5 cells. The structure of the mouse gene for HA4 was found to contain 4 exons 

25 and 3 introns. Sizes of exons and introns are indicated in base pairs (FIG. 1). By radiation 
hybrid mapping, the mouse HA4 gene was mapped to mouse chromosone 15, 8.99 centiRays 
from D15Mit22. The exon-intron and intron-exon junctions are indicated with the splice donor 
and splice acceptor sites in small letters. The genomic sequence of the HA4 exons, introns and 
promoter is provided herein as SEQ ID NO:3. The 2.1 kB promoter that has been used with a p- 

30 galactosidase reporter is indicated by underlining. Four exons are indicated in bold; the 
beginning of the first exon corresponds to the start site of transcription. The double-underlined 
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ATG corresponds to the first methionine residue in HA4. The sequence of HA4 protein was 
found to be 244 amino acids in length. 

Heterozygous HA4 mutant mouse embryo stem (ES) cells were generated by targeted 
recombination. In the targeting vector the E. coli LacZ gene preceded by an internal ribosomal 
entry site (IRES) was inserted in exon 2. In addition 67 bp of exon 2 were deleted. Correctly 
targeted ES cell clones were injected into mouse blastocysts to generate male chimeras, which 
then produced HA4 heterozygous mutant mice. Homozygous null mutant mice were generated 
by conventional mating of heterozygous HA4 mutant mice. 

To analyze the expression of HA4 studies were conducted using Northern blotting. Two 
fig polyA RNAs from different mouse organs was fractionated by electrophoresis in a 1% 
agarose gel ? blotted on a nylon membrane and hybridized with a 32 P-labeled HA4 cDNA probe. 
The filter was rehybridized with a p-actin cDNA probe to verify equivalent RNA loading. The 
size of HA4 RNA was found to be approximately 1.6 kb. Expression of HA4 mRNA was also 
detected using various cell lines. Expression of HA4 was observed in osteoblastic MC3T3-E1 
cells and undifferentiated ATDC5 cells, while none of C3H10T1/2 cells, myoblastic C2C12 
cells, and Balb/c 3T3 fibroblasts expressed HA4 mRNA in vitro. Moreover, HA4 mRNA was 
expressed at high levels in bone in adult mice (FIG. 2). In a similar experiment, 2 jag polyA 
RNA of whole mouse embryos was fractionated by electrophoresis in a 1% agarose gel blotted 
on a nylon membrane and hybridized with a 32 P-labeled cDNA probe for HA4. The filter was 
then rehybridized with a P-actin cDNA probe. HA4 expression was analyzed during mouse 
embryogenesis (FIG. 3). 

Paraffin sections were generated and hybridized in situ with a 35 S-labeled HA4 RNA 
probe. By in situ hybridization, HA4 expression was found to localize to chondrogenic 
mesenchymal condensations in E13.5 mouse embryos, and in cartilages, bones, and periosteums 
in E16.5 mouse embryos. HA4 was detected in mouse embryo forelimb at El 3.5 and of mouse 
embryo elbow at E16.5 (FIG. 4). 

Furthermore, X-gal staining of heterozygous HA4 mutant embryos with a LacZ gene 
inserted into one HA4 allele revealed specific expression of HA4 in bones and cartilages. HA4 
heterozygous mutant embryos at different times of embryonic development were stained with X- 
gal (FIG. 5). FIG. 6 demonstrates embryonic development in a heterozygous HA4 mutant 
embryo at day 15.5 by staining with X-gal. The embryo was made translucent by treatment with 
0.5 percent KOH and 50 percent glycerol. 
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Thus, the inventors have identified a secreted polypeptide, HA4, which is expressed 
selectively in osteoblasts. 

EXAMPLE 2 

5 HA4 Deficient Mice 

To demonstrate a role for HA4 in bone and cartilage metabolism, the HA4 gene was 
inactivated in mouse embryonic stem cells. Homologous recombination was used to produce 
mice that are homozygous-null for HA4. The tibia of 3 month old HA4 null mutant mice was 

10 examined by microCT and this analysis compared with that of same sex wild type littermates. 
The fraction of bone volume over total volume was found to be markedly reduced in HA4-null 
mutants. This was due both to a decrease in bone trabecular number and to a reduction of 
trabecular thickness (FIG. 7). Thus, HA4 deficient mice were found to have reduced bone 
density. The inventors therefore concluded that HA4 is necessary for normal bone density. This 

15 phenotype mimics that observed in humans with osteoporosis and provides a model for human 
osteoporosis. 

EXAMPLE 3 

Generation of Transgenic Mice and Detection of HA4 Protein in Serum 

20 

For analyzing the function of HA4 in vivo, transgenic mice were generated in which the 
HA4 protein is overexpressed in osteoblasts. A recombinant DNA which specifies a HA4 tagged 
by 3 tandem copies of a short hemaglutinin (HA) peptide was constructed. The DNA for this 
tagged HA4 was placed under the control of the Collal 2.3 kb promoter and transgenic mice 

25 were generated that express the tagged HA4 protein in osteoblasts. The 2.3 kb Collal promoter 
was specifically activated in osteoblasts. Using an antibody against the hemaglutinin peptide, 
the tagged HA4 protein was detected. The transgenic mice were found to be normal. 
Immunohistochemistry with rabbit anti-HA antibody showed that 3xHA-tagged HA4 protein is 
specifically localized in bones of El 8.5 mutant embryos. 200 jal of blood were collected from 

30 the heart of the transgenic mice, and the serum separated by centrifiigation. 3xHA-tagged HA4 

protein in the serum (100 \x\) was purified with an anti-HA affinity matrix. 3xHA-tagged HA4 

protein bound to the matrix was then extracted with Laemli SDS buffer and separated by SDS- 

PAGE gel. The protein was detected by Western blot using mouse monoclonal anti-HA 
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antibody corresponding to a size of about 35KDa (FIG. 8). This experiment indicates that the 
HA4 protein is secreted in the circulation and that the levels of HA4 in serum can be measured. 

EXAMPLE 4 

5 Production of Recombinant HA4 Protein 

For production of pure recombinant HA4 protein, Flag-tagged, 6xHis-tagged mouse HA4 
cDNA was cloned into the pBACgus-1 plasmid. This vector was transfected in Sf9 insect cells 
and produced a high-titer baculovirus stock. For production of recombinant HA4 protein, Sf9 

10 cells were infected with HA4-baculovirus at a multiplicity of infection >5. Twenty-four hours 
after infection, the conditioned media was collected and the recombinant protein was purified by 
affinity chromatography with Ni-NTA agarose using a Batch/Gravity-Flow Column purification 
method. The Ni-NTA agarose bound recombinant protein was eluted with lOOmM Imidazole. 
The purity of recombinant protein was about 80%. To produce pure recombinant HA4 protein, 

15 this purified protein was applied onto a MonoQ column, and pure recombinant HA4 protein 
eluted with 500mM NaCl using ACTA System. SDS-PAGE analysis shows essentially 100% 
purity of this recombinant protein (FIG. 9). This purification scheme can be used to purify 
homogeneous HA4 and determine the three-dimensional structure of the protein. 

20 EXAMPLE 5 

Production of Mouse Monoclonal HA4 Antibody 

Due to the inability to obtain a high-titer antibody rabbit anti-HA4 polyclonal peptide 
antibody and chick anti-HA4 polyclonal peptide antibody could not be produced. One possible 

25 reason for this is that HA4 protein exists in serum and is very highly conserved. To resolve this 
problem, recombinant mouse HA4 protein was injected into HA4 knock-out mice. Five 
microgram of recombinant HA4 protein was injected into the paw of a knock-out mouse five 
times every other day. Lymph nodes of inguinal regions were then removed and lymphocytes 
were prepared. These lymphocytes were fused with mouse myeloma cells, generating 

30 hybridomas. Monoclonal antibodies in the conditioned media of these hybridomas were 
screened by ELISA using recombinant HA4 protein and Western blotting. At least three clones 
were identified that secrete high-titer monoclonal antibodies. These monoclonal antibodies can 
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be used to measure levels of HA4 in human serum to detect whether changes occur in bone 
diseases. 

jjc cf* *}c j|t «fc 5§£ ij? ^jc ijc 

5 

All of the compositions and/or methods disclosed and claimed herein can be made and 
executed without undue experimentation in light of the present disclosure. While the 
compositions and methods of this invention have been described in terms of preferred 
embodiments, it will be apparent to those of skill in the art that variations may be applied to the 

10 compositions and/or methods and in the steps or in the sequence of steps of the method described 
herein without departing from the concept, spirit and scope of the invention. More specifically, 
it will be apparent that certain agents which are both chemically and physiologically related may 
be substituted for the agents described herein while the same or similar results would be 
achieved. All such similar substitutes and modifications apparent to those skilled in the art are 

15 deemed to be within the spirit, scope and concept of the invention as defined by the appended 
claims. 
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CLAIMS 

1. An isolated nucleic acid segment encoding a polypeptide comprising the sequence as 
shown in SEQ ID NO:2. 

2. The isolated nucleic acid segment of claim 1, wherein the nucleic acid segment 
comprises the DNA sequence as shown in SEQ ID NO:L 

3. The isolated nucleic acid segment of claim 1, further comprising a promoter operably 
linked to the region encoding said protein. 

4. The isolated nucleic acid segment of claim 3, wherein said promoter is an inducible 
promoter, a constitutive promoter or a tissue specific promoter. 

5. The isolated nucleic acid segment of claim 4, wherein said tissue specific promoter is a 
bone specific promoter. 

6. The isolated nucleic acid segment of claim 1, wherein said nucleic acid segment is 
comprised within a viral vector. 

7. The isolated nucleic acid segment of claim 6, wherein said viral vector is selected from 
the group consisting of an adenoviral vector, a retroviral vector, an adeno-associated viral 
vector, a vaccinia viral vector, a herpesviral vector and a pox viral vector. 

8. The isolated nucleic acid segment of claim 1, wherein said nucleic acid segment is 
comprised within a non- viral vector. 

9. The isolated nucleic acid segment of claim 8 ? wherein said non-viral vector is comprised 
in a lipid carrier. 

10. The isolated nucleic acid segment of claim 1, further comprising a region encoding a 
selectable marker protein. 
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11. A nucleic acid segment characterized as: 

(a) a nucleic acid segment comprising a sequence region that consists of 14 
nucleotides that have the same sequence as, or complementary to, at least 14 
contiguous nucleotides of SEQ ID NO.l; or 

(b) a nucleic acid segment of from 14 to 10,000 nucleotides in length that hybridizes 
to the nucleic acid segment of SEQ ID NO:l, or the complement thereof, under 
stringent hybridization conditions. 

12. The nucleic acid segment of claim 11, wherein the segment comprises a sequence region 
of at least 14 contiguous nucleotides from SEQ ID NO:l or the complement thereof 

13. The nucleic acid segment of claim 11, wherein the segment comprises a sequence region 
of at least 17 contiguous nucleotides from SEQ ID NO:l or the complement thereof 

14. The nucleic acid segment of claim 11, wherein the segment comprises a sequence region 
of at least 20 contiguous nucleotides from SEQ H) NO:l or the complement thereof 

15. The nucleic acid segment of claim 11, wherein the segment comprises a sequence region 
of at least 25 contiguous nucleotides from SEQ ID NO: 1 or the complement thereof 

16. The nucleic acid segment of claim 11, wherein the segment comprises a sequence region 
of at least 30 contiguous nucleotides from SEQ ID NO:l or the complement thereof 

17. The nucleic acid segment of claim 11, wherein the segment is at least 17 nucleotides in 
length. 

18. The nucleic acid segment of claim 11, wherein the segment is at least 20 nucleotides in 
length. 

19. The nucleic acid segment of claim 11, wherein the segment is at least 25 nucleotides in 
length. 

20. The nucleic acid segment of claim 11, wherein the segment is at least 30 nucleotides in 
length. 
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21 . An isolated polypeptide comprising the sequence as shown in SEQ ID NO:2. 

22. The isolated polypeptide of claim 21, wherein the polypeptide is comprised in a 
pharmaceutically acceptable earner, diluent or excipient. 

23. The isolated polypeptide of claim 22, wherein the pharmaceutically acceptable carrier is a 
lipid carrier. 

24. The isolated polypeptide of claim 23, wherein the lipid carrier is a liposome. 

25. The isolated polypeptide of claim 23, further comprising a bone tissue targeting agent. 

26. A recombinant host cell comprising a nucleic acid segment encoding a polypeptide 
comprising the sequence as shown in SEQ ID NO:2. 

27. The recombinant host cell of claim 26, further defined as a prokaryotic host cell. 

28. The recombinant host cell of claim 27, wherein the prokaryotic host cell is a bacterial 
host cell. 

29. The recombinant host cell of claim 26, further defined as a eukaryotic host cell. 

30. The recombinant host cell of claim 29, further defined as a bone cell or bone cell 
precursor. 

31. An antibody that is immunologically reactive with a polypeptide comprising the sequence 
as shown in SEQ ID NO:2. 

32. A polyclonal antisera that is immunologically reactive with a polypeptide comprising the 
sequence as shown in SEQ ID NO:2. 

33. A method of identifying a subject at risk of or suffering from a bone degenerative disease 
comprising: 

(a) obtaining a bone tissue sample from said subject; and 

(b) assessing the expression of HA4 in said sample, 
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wherein a reduced amount of HA4 expression in said sample, as compared to the HA4 
expression observed in a healthy subject, indicates that said subject is at risk of or suffers 
from a bone degenerative disease. 

34. The method of claim 33, wherein assessing comprises measuring HA4 mRNA levels or 
stability. 

35. The method of claim 33, wherein assessing comprises measuring HA4 protein levels or 
stability. 

36. A method of treating a bone degenerative disease in a subject comprising increasing the 
level or activity of HA4 in bone tissues of said subject. 

37. The method of claim 36, wherein increasing the level or activity of HA4 comprises 
administering to said subject a therapeutically effective amount of an expression vector, 
wherein said expression vector comprises a nucleic acid segment encoding an HA4 
polypeptide under the transcriptional control of a promoter. 

38. The method of claim 37, wherein the promoter is a constitutive promoter, an inducible 
promoter or a tissue specific promoter. 

39. The method of claim 38, wherein the tissue specific promoter is a bone specific promoter. 

40. The method of claim 37, wherein the expression vector comprises a non- viral vector. 

41 . The method of claim 37, wherein the expression vector comprises a viral vector. 

42. The method of claim 37, wherein said expression vector is administered endoscopically, 
intravenously, intraarterially, intramuscularly, intralesionally, percutaneously, or 
subcutaneously. 

43. The method of claim 37, wherein said expression vector is administered directly to a 
bone tissue. 

44. The method of claim 37, wherein said administration is repeated. 

45. The method of claim 36, wherein increasing the level or activity of HA4 comprises 
administering to said subject a therapeutically effective amount of an HA4 polypeptide. 
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46. The method of claim 45, wherein the HA4 polypeptide is formulated in a lipid carrier. 

47. The method of claim 46, wherein the lipid carrier is liposome. 

48. The method of claim 45, wherein the lipid carrier further comprises a bone tissue 
targeting agent. 

49. The method of claim 45, wherein said HA4 polypeptide is administered endoscopically, 
intravenously, intraarterially, intramuscularly, intralesionally, percutaneously, or 
subcutaneously. 

50. The method of claim 45, wherein said HA4 polypeptide is administered directly to a bone 
tissue. 

51. The method of claim 37, wherein said administration is repeated. 

52. The method of claim 36, further comprising administering a second agent that induces 
bone formation. 

53. The method of claim 52, wherein said second agent is estrogen, raloxifene, alendronate, 
salmon calcitonin, a vitamin D analog, fluoride, or a PTH analog. 

54. A non-human transgenic animal, cells of which comprise one allele of the HA4 gene that 
does not express a functional HA4 product. 

55. The non-human transgenic animal of claim 54, wherein said animal is a mouse. 

56. A non-human transgenic animal, cells of which comprise an expression cassette 
comprising an HA4 5 '-regulatory region operably linked to a screenable marker gene. 

57. The non-human transgenic animal of claim 56, wherein said animal is a mouse. 

58. The non-human transgenic animal of claim 56, wherein the screenable marker gene is 
luciferase, green fluorescent protein, or p-galactosidase. 

59. A method of expressing an HA4 polypeptide in a cell comprising transferring into said 
cell an expression construct encoding an HA4 under control of a promoter active in said 
cell, wherein said expression construct effects the expression the HA4 polypeptide. 
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Northern blot analysis of HA4 expression 
in adult mouse tissues 
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Northern blot analysis of HA4 expression 
during mouse embryogenesis 
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In situ hybridization of HA4 
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X-gal staining (HA4 heterozygous embryos) 
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X-gal staining (HA4 heterozygous embryo) 
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cgc 
Arg 


gcg 
Ala 


gec 
Ala 


ccc 
Pro 


ccg 
Pro 
10 


cag 
Gin 


ctg 
Leu 


ctg 
Leu 


etc 
Leu 


ggt etc 
Gly Leu 
15 


48 


ttc 
Phe 


ctt 
Leu 


gtg 
Val 


ctg 
Leu 
20 


ctg 
Leu 


ctg 
Leu 


ctt 
Leu 


cag 
Gin 


ttg 
Leu 
25 


tec 
Ser 


gca 
Ala 


ccg 
Pro 


tec 
Ser 


age 
Ser 
30 


gec 
Ala 


tct 
Ser 


96 


gag 
Glu 


aac 
Asn 


ccc 
Pro 
35 


aag 
Lys 


gtg 
Val 


aag 
Lys 


caa 
Gin 


aaa 
Lys 
40 


gcg 
Ala 


ctg 
Leu 


ate 
He 


egg 

Arg 


cag 
Gin 
45 


agg 
Arg 


gag 
Glu 


gtg 

Val 


144 


gta 
Val 


gac 
Asp 
50 


ctg 
Leu 


tat 
Xyr 


aat 
Asn 


gga 

Gly 


atg 
Met 
55 


tgt 

Cys 


eta 
Leu 


caa 
Gin 


gga 
Gly 


cca 
Pro 
60 


gca 
Ala 


gga 
Gly 


gtt 
Val 


ccc 
Pro 


192 


ggt 
Gly 
65 


cgt 
Arg 


gat 
Asp 


ggg 

Gly 


age 
Ser 


cct 
Pro 
70 


ggg 

Gly 


gee 
Ala 


aat 
Asn 


ggc 

Gly 


att 
He 
75 


cct 
Pro 


ggc 
Gly 


aca 
Thr 


cct 
Pro 


ggc 
Gly 
80 


240 


ate 
lie 


cca 
Pro 


ggt 
Gly 


egg 
Arg 


gat 
Asp 
85 


gga 
Gly 


ttc 
Phe 


aaa 
Lys 


ggg 

Gly 


gaa 
Glu 
90 


aag 
Lys 


gga 
Gly 


gaa 
Glu 


tgc 
Cys 


tta 
Leu 
95 


agg 
Arg 


288 


gaa 
Glu 


age 
Ser 


ttt 
Phe 


gag 
Glu 


gag 

Glu 


tec 
Ser 


tgg 
Trp 


acc 
Thr 


cca 
Pro 


aac 
Asn 


tat 
Tyr 


aag 
Lys 


cag 
Gin 


tgt 
Cys 


teg 
Ser 


tgg 
Trp 


336 
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100 105 110 

agt teg ctg aac tat ggc ata gat ctt ggg aaa att gcg gag tgt aca 3 84 

Ser Ser Leu Asn Tyr Gly lie Asp Leu Gly Lys He Ala Glu Cys Thr 
115 120 125 

ttc acg aag atg cgc tec aac agt get ctg cga gtt ctg ttc agt ggc 432 

Phe Thr Lys Met Arg Ser Asn Ser Ala Leu Arg Val Leu Phe Ser Gly 
130 135 140 

tea ctt egg etc aaa tgc agg aat gca tgc tgt cag cgc tgg tat ttt 480 

Ser Leu Arg Leu Lys Cys Arg Asn Ala Cys Cys Gin Arg Trp Tyr Phe 
145 150 155 160 

aca ttt aat gga get gaa tgt tea gga cct ctt ccc ate gaa gec ate 528 

Thr Phe Asn Gly Ala Glu Cys Ser Gly Pro Leu Pro He Glu Ala He 

165 170 175 

ate tat ctg gac caa gga age cct gag tta aat tea act att aat att 576 

He Tyr Leu Asp Gin Gly Ser Pro Glu Leu Asn Ser Thr He Asn He 
180 185 190 

cat cgt act tec tct gtg gaa gga etc tgt gaa ggg att ggt get gga 624 

His Arg Thr Ser Ser Val Glu Gly Leu Cys Glu Gly He Gly Ala Gly 
195 200 205 



ttg gta gat gtg gec ate tgg gtt ggc ace tgt tea gat tac ccc aaa 

Leu Val Asp Val Ala He Trp Val Gly Thr Cys Ser Asp Tyr Pro Lys 
210 215 220 

gga gac get tct act gga tgg aat tec gtg tct cgc ate ate att gaa 

Gly Asp Ala Ser Thr Gly Trp Asn Ser Val Ser Arg He He He Glu 

225 " 230 235 240 



672 



720 



gaa eta ccg aaa taa 735 
Glu Leu Pro Lys 

245 



<210> 2 
<211> 244 
<212> PRT 

<213> Artificial Sequence 
<223> Description of Artificial 
Primer 



<400> 2 



Met 


His 


Pro 


Gin 


Gly 


Arg 


Ala 


Ala 


1 








5 








Phe 


Leu 


Val 


Leu 
20 


Leu 


Leu 


Leu 


Gin 


Glu 


Asn 


Pro 
35 


Lys 


Val 


Lys 


Gin 


Lys 
40 


Val 


Asp 
50 


Leu 


Tyr 


Asn 


Gly 


Met 
55 


Cys 


Gly Arg 


ASp 


Gly 


Ser 


Pro 


Gly 


Ala 


65 










70 






He 


Pro 


Gly 


Arg 


Asp 
85 


Gly 


Phe 


Lys 


Glu 


Ser 


Phe 


Glu 
100 


Glu 


Ser 


Trp 


Thr 



Sequence : Synthetic 



Pro 


Pro 


Gin 


Leu 


Leu 


Leu 


Gly 


Leu 




10 










15 




Leu 


Ser 


Ala 


Pro 


Ser 


Ser 


Ala 


Ser 


25 










30 






Ala 


Leu 


He 


Arg 


Gin 


Arg 


Glu 


Val 










45 








Leu 


Gin 


Gly 


Pro 


Ala 


Gly Val 


Pro 








60 










Asn 


Gly 


He 


Pro 


Gly 


Thr 


Pro 


Gly 






75 










80 


Gly 


Glu 


Lys 


Gly 


Glu 


Cys 


Leu 


Arg 




90 










95 




Pro 


Asn 


Tyr 


Lys 


Gin 


Cys 


Ser 


Trp 


105 










110 
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Ser 


Ser 


Leu 


As n 


Tvr 


Gly 


He 


Asp 

XT 


Leu 


Gly Lys 


He 


Ala 


Glu 


Cys 


Thr 






115 










120 










125 








Phe 


Thr 


Lys 


Met 


Arg 


Ser 


Asn 


Ser 


Ala 


Leu 


Arg 


Val 


Leu 


Phe 


Ser 


Gly 




130 










135 










140 










Ser 


Leu 


Arq 


Leu 


Lys 


Cvs 


Arq 


Asn 


Ala 


Cys 


Cys 


Gin 


Arg 


Trp 


Tyr 


Phe 


145 










150 










155 










160 


Thr 


Phe 


Asil 


Glv 


Ala 


Glu 


Cys 


Ser 


Gly 


Pro 


Leu 


Pro 


He 


Glu 


Ala 


He 










165 










170 










175 




lie 


Tvr 


Leu 


Asr> 


Gin 


Gly 


Ser 


Pro 


Glu 


Leu 


Asn 


Ser 


Thr 


He 


Asn 


He 








180 










185 










190 






His 


Arg 


Thr 


Ser 


Ser 


Val 


Glu 


Gly 


Leu 


Cys 


Glu 


Gly 


He 


Gly Ala 


Gly 






195 










200 










205 








Leu 


Val 


Asp 


Val 


Ala 


lie 


Trp 


Val 


Gly 


Thr 


Cys 


Ser 


Asp 


Tyr 


Pro 


Lys 




210 










215 










220 










Gly 


Asp 


Ala 


Ser 


Thr 


Gly 


Trp 


Asn 


Ser 


Val 


Ser 


Arg 


He 


He 


He 


Glu 


225 










230 










235 










240 


Glu 


Leu 


Pro 


Lys 



























<210> 3 

<211> 20138 

<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: 
Primer 



Synthetic 



<220> 

<221> modi fie debase 
<222> (265) . . (18755) 
<223> N = A, C, G or T/U 



<400> 3 

gaattcggat 

tactttattc 

atcttgctct 

accggatctg 

gaattagaag 

tgcaaaaaat 

tataactgcc 

ctaaattaac 

tggactaaat 

aatcattatg 

cttttgccat 

atgggtctaa 

aactcacctg 

agaacctatg 

taagtagatc 

aaaattcaaa 

cacggggata 

aatatctagt 

tcatgtaaca 

aactagaaag 

aataagagta 

atgtggaaac 

caagaatact 

atatattcaa 

ggcattgttt 

ttcttaaaag 



ccagacaatg 
agttgcaaat 
atttttttct 
gaccctttca 
taactggatg 
ttagaacaaa 
cctgttttgt 
ttgctaggta 
attagaccag 
actctattaa 
aaagttaaaa 
aacatgtacc 
ccagctcatc 
tttttccctt 
tttttaagat 
aataaaagta 
taatttcccc 
ccttagaatt 
aaggcagtga 
cttgaaaagt 
tttatttgca 
aagtagtaac 
aaaatattat 
tctatcttat 
taagatgtcc 
gacagcattt 



agaaggtaac 
tagcaaggct 
ttgtgaaaaa 
atcattagtt 
ccagnggcaa 
taaggcaaag 
ctaaaaaggc 
attaagaatt 
tccaagattg 
atgactaatc 
ggcttgaagc 
caatttagtt 
acacgaatta 
ccctaataag 
actataaagc 
tgatttaaat 
cttctttgtc 
aataacttgt 
ctaattgcac 
tattaatagt 
gcaatttatt 
tgaggacact 
cttaaaccat 
atgtacgacc 
aggcaatcta 
tttttctatg 



aatacaaagc 
caagctcagt 
acacagactg 
agtgggtcct 
tccgctgcag 
gcaagatgtg 
cagaaactct 
ttagctatct 
aataactggt 
ttactaagtc 
tagtccaaac 
tcctagtcac 
accatcatgc 
gtagcagttt 
atttattaaa 
agtgtgcacg 
ttttaagaag 
tgacaaaaca 
tttcaattgt 
cacatgtgca 
aagtaaaggt 
aagaggaaag 
aactgctttg 
tatagactgc 
gaacaattta 
agttgtatac 



ttcttttggg 
ctctggcctc 
aacccctacc 
tccattttac 
actgaccttt 
cctttggtga 
ctggaagtgg 
acaatatgaa 
cacactgaag 
agattatagt 
tagaaaaatg 
ttcaatcagg 
ttctatagca 
aggataaaac 
ttctttgaca 
ggaatgtact 
ctaaagtttt 
tctcaaatac 
ttttcttaag 
taatttattt 
cctcaagaat 
aattatacat 
aatatacaaa 
ctgagatata 
ttaaatgtct 
gcataagcaa 



gaagtccagg 
ctagtgggcc 
ccagattaga 
catggcataa 
aatatcagtt 
ctttggggga 
aggctgtgct 
tctctaatat 
aaaagagaaa 
ctctgatgtt 
gcaagcaaag 
gcactgagtc 
ttctgtggag 
cataagccaa 
tctaaatttc 
aaataatatg 
aaagtttcac 
ttgactgtat 
gagcagttgt 
atttactaga 
taacactatt 
gcacaatttt 
gaacaaggtt 
aatttagcat 
aaaaaccacc 
ttgcaaattt 



60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 

780 

840 

900 

960 

1020 

1080 

1140 

1200 

1260 

1320 

1380 

1440 

1500 

1560 
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gagaattaat 
cattgactat 
agcacacaac 
tgatctaatt 
gatgtatgca 
ttatctttgg 
ttaatatttt 
ggtttgnttt 
acagggtttc 
cgaactcaga 
cacgcccggc 
aatatacatt 
ttttgaggtg 
agaggctgaa 
aaaaatatgg 
gccacctgct 
ggccccaccc 
aggatgagaa 
agctaatatc 
tctgtaaagc 
tcttttgggt 
atattttcca 
gaagcagaaa 
ctaaatattc 
aaaggagaga 
agccagattg 
tttgaacttt 
aatatctcct 
gcttagttcc 
aactttggga 
aactttacta 
agttgccaat 
ataacttatg 
agttacactt 
cactttatca 
atttatcagc 
aacagtttct 
ccactcttgg 
taaaacatag 
aacggattgg 
ccagagagaa 
ccacatgatg 
ccctttcttt 
aaatttcact 
cccataccaa 
caaaccagca 
caatctgttg 
gaatctggaa 
aagacatcag 
agggaggggc 
ggctccggaa 
gctctnttag 
caatatttgt 
attgagacag 
ctgtccatga 
gttcaccacc 
atgcacttat 
aatacatgtg 
atcaaaacat 
agcaaagcac 
aggcatcttc 



agcatctgag 
taatatacaa 
tagattatct 
cacatgtact 
tgtatgaatt 
gatattgatt 
gtaataacca 
ttgttttttt 
tctatagccc 
aatccacctg 
tattagtcag 
acaattcttt 
aaaaccacct 
ggcancccaa 
aggctcatgg 
ccttggcggc 
cacagcagcc 
aaatgacttt 
taattaccat 
cttctgtagt 
gagagcaagg 
cttaggaaac 
tgataaaaat 
cagtngttnc 
gaaaaagaaa 
taatgccttg 
attaaaacaa 
ggtttgcaag 
tcttgccatg 
aagtaaaact 
atacattgcg 
tttaaggcta 
agctgattgt 
agaccaatca 
tttgccaaat 
ttttttaccc 
ccagcagggg 
ctttgatgaa 
aacattcatt 
tgcacccaaa 
caaagcagcg 
cagtttctaa 
tctcacgcct 
gctgaaacta 
ccttctccgg 
atgcttcttc 
aggccagcca 
gagaagaggg 
tctgatcaag 
ccattcctgc 
cagctaggcc 
ccctaacagt 
tggtcttttt 
gaatttgctg 
actcacagag 
accacctagc 
cataagaggt 
tgtccttggt 
cctttcacct 
catcagacac 
ttctttaggt 



ggacgaacaa 
attaaggttt 
gtgctgaaac 
tacatctcat 
tatagaagat 
gtcaatctat 
aataaattgc 
gcttttttgn 
ttggctgtcc 
cctctgcctc 
gtttttaaga 
attagtacca 
caatcttggc 
tctgttgtag 
ggcaaaggtg 
ctcttgtacc 
agcagctagc 
acaataatgt 
tgattgtaat 
aaactaaaac 
atttaaagta 
aatatacact 
ttctcacata 
anactcacta 
caaactttca 
ataaattata 
catctggcta 
acaaaggaaa 
aggacatata 
ccattttaaa 
aaatttggcc 
accttctgtt 
gaaaacacgc 
gagaatatta 
acattaataa 
caaaaccaag 
ccctcctgcg 
ccaccagata 
cattctgact 
cacaaaaaca 
cacccctgaa 
ctgtggccat 
aaaaaaagca 
agtgaattta 
tacgaatttt 
caaatgtcct 
gcggcccaca 
aactaggcgg 
gttcaatttt 
caaatcatcc 
gcaggtagca 
caaacctgat 
ttcatgcagc 
tgtagccttt 
atctacctgc 
ccataatctc 
tccaacagat 
gaagaatccc 
gtgtctgctt 
gactgacttt 
taagaaaatt 



tcttaagtaa 
gattgttcag 

9999999999 
ataggggttg 
acaaaagcat 
ttaaaatgat 
tgtattgaac 
tttgtttntt 
tggaactcac 
ccgagtgctg 
cacaattttt 
cagtggttcc 
tgcctgccct 
accanaaaac 
aacgcngccc 
actatagctc 
tggcttcaca 
tttctcttgc 
taataacaat 
cctaagtaat 
aacttctctt 
gaaagattaa 
atgtagatca 
gaagcaaaac 
gatctataat 
cagtctgatt 
gatccgtaaa 
gtcacacaaa 
ttagaattac 
acaatttatc 
tgctgtgcca 
accaatgtaa 
agagcacatt 
atttttctag 
cgattgcaga 
aggctgcaga 
gtgtgctggt 
aagtttttta 
ggacacgaac 
caagagggca 
atttctctct 
ccacctgatt 
acttcttgga 
gcgcgcaggt 
ctttaattat 
ctgctgtttc 
tgtctgggtt 
ctcgagagag 
actatttgga 
ttggagtcca 
gcagtgggag 
cagccagatt 
ttttgggggg 
attgtcttgg 
ctttgcctcc 
atattctatc 
ggaacacaac 
ttatcttatg 
taggataaca 
ccaaagaacc 
tttttcctgt 



ttgtgagctc 
cattttaaaa 
gactcaaaag 
ttttcaaaca 
gtataaaaac 
ggtatgtcat 
aaggaacata 
tttttttttg 
tttgtagacc 
ggatcaaagg 
ttttaaaaaa 
aataggatgc 
cggccttcct 
ccntgccgag 
tctcctgagt 
tgggaaactg 
tgggtcaatg 
caaagaattg 
tataaaagct 
aaaaagatat 
gagaggcatt 
aactcttggc 
atatttagcc 
ctttacaatc 
aactatataa 
gaccctttat 
ggtgctcttfc 
ccgaacagaa 
taagtttctt 
actttcggaa 
cgtgcttgag 
caattattta 
accaataaaa 
ctacaatcat 
gaaccctcca 
ctctcttttt 
ttcctggtta 
gccatcttta 
atacatgaat 
gagaacttct 
gcttctggga 
attaaatccc 
gctgcggcta 
ttaacgcagt 
tttcttcttc 
ccagtcccgc 
ctagcctggg 
aatggaacga 
gacactgggt 
gtttcaggtg 
tgacaggctc 
tcaggctggg 
gggttgtttt 
aatttgctct 
ctagtgctgg 
aagggattta 
tacacagtat 
tcctttcatg 
ctccttcaca 
cttaagtttc 
cactctgtgg 



tgagatataa 
caccatctac 
aataagcatg 
caataattgg 
aatttttaac 
aggtcaatta 
cacttagtca 
gnttttcgag 
aggctggcct 
cgtgcgccac 
aaaatacacg 
tgaaactgta 
ctcaggcana 
tcttagtggg 
tcgctgtaaa 
tgtgtctcaa 
atcggctggg 
ttttgagcac 
ttctttattt 
tgcctctgag 
agctaataaa 
ttcttgtata 
atactagagg 
attaagangc 
aaggtaggca 
aaactctaaa 
tagctaaaag 
gctgctctga 
ttgcaccatt 
tcaacattaa 
aagactcctg 
atcttaacca 
aaaatgaagc 
agaataaaag 
ggaaaaaacc 
tcctataaag 
aagcatgtgg 
ttatccaaca 
tgaacatgtg 
gctgtaaggc 
cattttcctc 
ttttattaaa 
actgcctgat 
ttcaacttag 
catggagctc 
gttggttcgc 
aggtatcttg 
taatggaacc 
tatgaagtag 
accacgtgtt 
cacccctgat 
ggggaggtta 
gttttgtttt 
gtaaactagg 
ggttaaaggt 
ggttccataa 
aaggtggact 
tgcttgcttt 
tgtttgccac 
cacttcagat 
aaaatatttt 



1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
414 0 
4200 
4260 
4320 
4380 
4440 
4500 
4560 
4620 
4680 
4740 
4800 
4860 
4920 
4980 
5040 
5100 
5160 
5220 
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ctgtgcattt gacctgtgtt tcttctcctt 
cacagtgtcc cagatttcct ggatgttttg 
tctttgacct atgtattcat ttcttctacc 
aattcttgaa ttctgttggt gaagcttgcc 
tcatttccat atttccctca gtttggggtc 
ccaggagtca cctcacaaat cacatattcc 
actcaggagc tgaactgtga tgtagagtca 
ctataataac catctctgct aagttactcc 
gtaggagcat gtctttgttg tgtacttatt 
ggcagaggac ttgcctcatc ttatattcat 
tccacataca tgtctctttc tgtcaagtag 
cctaaacttt attctgcccc tactcaaaat 
ctctcttatg ttgggatcca tcccaagggc 
tacacaggtg cagaaagtgt tggtttctga 
tgagaaagtt tccttgtgat attaggaaca 
ttaccaagac cttaacaatc ccagttcctc 
cttccattgt ttgtgctttc atagacttct 
cctctagcat acacatatag gctgtgttaa 
ctcagggcct gctgtgatag ggttggtggg 
attggttgtg caggggtagc ctgtaggttc 
gggaaaaaca ttgagtgacg ggaggaaagt 
agatgggtgc agaagcagcc tagggctgag 
gagaggtgaa gatctgcagt tagcccacct 
ccagggagtg ccggctggag ttgggggtgg 
ggaatttagg aggggaagat ctgtgggatc 
cgctgcagga gttagcgcag agctcaggat 
gggagcgtgg aggtcgcctc tccctctccc 
tcagagagtg cctaagagag ttggaggctg 
caaactatat taatctgatg tgaaataagc 
tggctatttt ctaccagcct cagttcacct 
ccacaggatt gccaagaggc cacgtgaaca 
taacactcaa acactatcga ttatttgagt 
gctctctcca tttaataccc caaatgtcac 
gttttttagg aatgaaggaa cacgaaggtg 
ctaagaaact caggagttgc ggtctccatt 
cactgtcacc ctgcctggtc tctgctttcc 
tgggcgggcg gagctgcgga ggaggggcgg 
atgcaaagct ccccggcgtc cagggggtgg 
ggcgcattga tgcggcgggc tgcagggctg 
ccgacccctt tacctcctgc tccgcgcttc 
ccgcgcggcc cccccgcagc tgctgctcgg 
gtccgcaccg tccagcgcct ctgagaaccc 
gagggaggtg gtagacctgg taagtctgag 
ggactcagcc aggatcgcac cggaagggca 
ggggtgagtg tagggcagca cgttaagaag 
cctgtgtggg gacggatctg acgcacgcct 
catgagaggt caccacatgc tccgacgtgg 
gcccagtaat ttctggctcc aggggaggcc 
tgcaactgga gtagactgag ttagtcagtt 
aggccagctg gtcgcttggc cctgggccaa 
gtggccacga tctgccacgt ctgcctggag 
ttcagctaac ttgaagatta tgaatcactt 
agtttactcc ttagaggacc acagcttgag 
ggggaaaaat gaaggcccca ctaggagccc 
taaaacagat gagcagtcag gtcttaacct 
gctaagggga aggaaagcgc ttccaggaaa 
aatctgtggg aaagtctgtt tgcttaaaac 
gagggcagag agfcttgtctg tcccactcaa 
ggcacatcat aaacttctgt ccttaaacca 
taaaggacag gggacattta cttattttat 
cttccactga cattctaagt tgagctaata 
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ccttaattct tattttagat ttggtcactt 52 80 

aaccaggagt tatttttaga ttgaacattt 5340 

atgtcttcag tgtctgagat tctctttttt 5400 

tctgtggttg ctgttcaaat tcctatattt 54 60 
tctttattaa ttctgttgga ggaagggggt 5520 
caggattgat tgatcaggac caccggccag 5580 
agatcctaca gggcttttaa agcctgagag 5640 

accaatcata acttagggat agggctttct 5700 

ttgctcctat tggttagggt attcaactat 5760 

gtcttagctt gccaaccagg atatcagttt 5820 
gatatcaagt tcccagggag gtcttggaaa 5880 

ggaagtctta ttctaaatag gtacaggtgt 5940 
agcttaaaag gcaaatacta taaaggctga 6000 

gaacatccta gtaacagaag taacagcata 6060 

gcacaaactg gtaggttaga cgggtaacag 612 0 

tttcaggtct tgagtggctg tattcctttc 6180 

ttaagggatt taatgttttc tttttaagga 6240 

ggtctttatg tgtgcttcca gggtgtaata 6300 

ttctagtgga gacctatcgt cctggctgta 6360 

ccaatgagtg tgtgcctgag ctggatgctt 642 0 

gggg99 cc ag ggatctgtat gcttcactga 64 8 0 

actgaggggt tccactctga gaagcagagg 654 0 

gcgtccctgc ccagtgtggc ctgtgggttc 6 6 00 

agggtaggac agggcaatga gtgggggaag 666 0 

caccagcgat gaggtggctg tggtggaagc 672 0 

gaaactaggg attgggcgtg gaggaatgga 6780 

tggataggta ggtcacccat ttgcttcccg 684 0 

ctttcctggt tgaagttgga tagaactttt 6900 

acgtgaaagt gaacctccag cactgaatgt 6 960 

atgaatggag actccaggct gcatcgtccc 7 02 0 

aagctttaca ttttggagtt tagaaggggg 7080 

cataggactc ttatagactg ttatattctg 7140 

tttttttttt ttttttaata atgagcctta 7200 

attcctgagg ccgagttaag acacgtgcct 7260 

cccccaccac caccacctgt ggtttctgac 7320 

tctctggttc tgcagcaccc cgcgggggtc 73 80 

gctagacccg ggacccaggc ctataacagt 7440 

gagggaaaaa ggaggccggc ctcaatgaaa 7500 

ggccagacgc tgagcagggt caggctcctg 7 560 

gcagccaccg cacaccatgc acccccaagg 7620 

tctcttcctt gtgctgctgc tgcttcagtt 7680 

caaggtgaag caaaaagcgc tgatccggca 774 0 

agtcggtcct gacctcagtg ctggaagaga 7800 

tcagtataga tggtggtggt gctgaccgta 7 860 

cttgagtgcc tcagtgtcct gccttgtgta 7920 

gcagcagagt cttgaaccgc tacgggagat 7980 

gtcaggtggg atgcccaaat ccgtgtagtc 804 0 

accgttggga gaagtggggg atgctgtggc 8100 

gatttcaaaa gaaagcccga ggaagaccct 8160 

ggctgtgcaa cgtgtccttt gtgaggacca 8220 

gagangctaa caacccccac aaagcatttg 8280 

tgtgtcatct ccctgggaaa tatgaactgc 8340 

ccaggagtgg tcagagactt tgaagctgaa 8400 

ttccaaggac ccatttttgc ctgatctgtt 8460 

gtgactgcca gtcaggaaca ctgtactcaa 8520 

gcaaatatcc caagggcttt ctgagaggct 8580 

ctttccctct aaaagtcaat aaacctagtg 864 0 

gagccagcca tcgatagatt tgtagtcttt 8700 

agctatatgg ttgtcaggca ctgcgataca 8760 

tattattatt attattttag atttgaattt 8820 

aaccaagctc cttgacagct agttctaaac 8880 
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tgattcaaaa 
tgttttatgc 
tctattttga 
gcagtttgcg 
ctagcatttt 
catactgtgc 
aaagtccaca 
gctttcaaag 
aaatggggcg 
ttcactaatg 
tactctatgt 
tttacaacaa 
aaagtctaac 
taagcaacgc 
taaggttcta 
ctctctctct 
gtgtgtgtgt 
ttgacaggat 
cacacgtgtg 
atttataagg 
cgggatgagt 
acagcacaca 
attattacac 
cttgaggaag 
a tgggaagag 
ctgactgcta 
caaaaggaaa 
agagaaatac 
atggaaccaa 
tttctttaat 
agaaaccctg 
caacaaaccc 
tcaggttctg 
agaacacgga 
tgatcatttg 
ttttgtactg 
actacctccc 
tcctttgcac 
gatgggagcc 
ttcaaagggg 
tataagcagt 
agtaaagccc 
caaatttatt 
ggtaatgtgc 
ttttataaag 
tagcttttct 
taatacatcc 
gactgggatc 
tgctgaaatt 
ctgtaatatt 
aatgatccag 
gtgtagctga 
gccattggcg 
ccgtatcttc 
atggttctgc 
ttttaagttg 
aaaagttcaa 
ggttatacac 
aatgatgact 
caatagaagt 
tggtttttta 



gcactggggg 
tctgatatat 
gtgcctaagt 
tattatgaat 
ccccctctaa 
tgccaggctc 
cagtcattag 
cgtgtctttc 
ttgggtgagg 
attctaaata 
agcatctttc 
taaatgaaat 
atcatccctg 
tcaaaatatt 
agttcaagcc 
ctgtgtgtgt 
gttttgaggt 
cctaaactct 
ctaccaactg 
aaatgggcac 
ctagtattta 
gaaagaaaca 
acctgctgag 
acagaatgga 
cagaggaggg 
aagttttatt 
aaaatctata 
ccaaaaccaa 
gtcccttgat 
cctagcactt 
tctcttaaaa 
taaaacaaac 

gaggagaaat 
cgaggtgggt 
actctcacat 
gggatcaaac 
tatgccctac 
cgtagtataa 
ctggggccaa 
aaaagggaga 
gttcgtggag 
aaattataat 
ttttattttg 
tgacttttta 
gcatttctaa 
tttttcaaat 
tgctgtagct 
cctgctacgg 
a 9999 aa ctg 
tgtgtaagga 
ggactgttga 
99999 a 9cat 
a 9tggggggg 
ttttgatgac 
ttaccatgac 
tagtatggcg 
gaattgcctt 
acacacacac 
gttatttcaa 
tgccattttc 
ttttttggtt 



aaaatccctg 
aattttcctc 
tgttaaaaaa 
gtgtagttaa 
ggattataca 
aggtgcatcc 
gaaggtttcc 
acgctcccac 
gaatttgagc 
cctattggaa 
ctgagaggaa 
gttaaagtaa 
tttctatgtg 
tccactgtag 
tcattgatgt 
gtgtgtgtgt 
gtggtcccat 
gggtccctgc 
ttaacttcat 
tgaaatgctg 
aaaaaaaaaa 
acagaacaga 
gacaggggat 
ctggagaagt 
gcacatccaa 
tcacaattat 
tggccaagta 
gcattaaaaa 
aaaatatggg 
gagaagcaga 
acaaaacaaa 
aaataaaaag 
tttaaacacc 
tagagggact 
actgtttatg 
cttgagcatc 
actgagtctg 
tggaatgtgt 
tggcattcct 
atgcttaagg 
ttcgctgaac 
aaagttgaag 
ttgttaaacc 
cttaaaaatt 
gatagataat 
tagacaggaa 
tcagagctaa 
gaagtccttt 
aagtcaatgc 
gaagcacgca 
gtggtcctgt 
gcggagcctc 

999999 a 9 aa 
cacctctgtt 
cacccggaaa 
tcatccacgt 
gattctgtgt 
acacacatat 
catttgacaa 
cccccaaagc 
tttggttttt 



ctgtttcacg 
cacaaaagca 
aaaaaagtgc 
gatacataaa 
gtaccacaac 
tgtagattgg 
acagattcta 
tgaattctgc 
ttccattcac 
ctagcatttt 
tttagaaatt 
aaaatttaat 
aactaataca 
ctcaatggta 
aatgtgaaca 
gtgtgtgtgt 
gtatcccaga 
ctctgcttcc 
ggcaagaatg 
agtaaatggc 
aaggttctag 
gaacactagc 
cacgtgaaac 
tctacccact 
ctttagagat 
ttccctagca 
ttctcattat 
tcttgtatag 
attatgggcc 
gagagttctg 
acaaaacaaa 
aaagggattg 
cttaaaatga 
ctgccagttt 
tattcatgac 
ctgcatgcca 
catgttaagt 
ctacaaggac 
ggcacacctg 
gaaagctttg 
tatggcatag 
caaaatataa 
aaaggacctt 
aatttaaaaa 
cactatttta 
ccagtcttgt 
aagtggaaga 
tgtccttggt 
tagtgattta 
aaatagcatg 
gattagtgct 
gtgtgcataa 
gagacagaac 
tctgcctcct 
acagaagctt 
ttttcatctc 
gacaggtttt 
atatatatat 
tggtggactg 
tgtcccttta 
tgagacaggg 



cagcagtggt 
tactgtgttg 
cacatgaatg 
atagtcattt 
tctaccccaa 
atttggttct 
taaagcgact 
cccctggtgg 
aggttttcat 
aagttaagaa 
atcaaatcat 
tggaattcat 
aggataagtg 
gagtacttgc 
tgcatgctct 
gtgtgtgtgt 
ctaggcacaa 
caggggaaag 
cttattatca 
ctggtctcat 
aatccacgcc 
gcttacagat 
taactgggcc 
gcactttcct 
gttgcgatac 
attattttag 
attaatattt 
taatgtaaaa 
tggcagtggt 
ggacagccag 
acaaaacaaa 
taagaagcta 
gttcgccata 
attaaaaata 
tgtatttctt 
ggcaagagtt 
aatggtactg 
cagcaggagt 
gcatcccagg 
aggagtcctg 
atcttgggaa 
gagtttgtat 
aaattaaaag 
aatggtaacg 
taaagcaaac 
aaagtaactt 
tgaacctaaa 
cgccgtgctc 
aaatagtgac 
tcagggaggg 
ttgtctccaa 
ggccctcggc 
tgagaaagtc 
tagctataat 
attttcataa 
aacctcttct 
tcatctagct 
attaaagaaa 
ccttcttatt 
tggttcctgc 
tttctctgtg 



a 999ttttgt 
gagctacagt 
tggctcgtgt 
ccccataaag 
cttggaaaag 
ggtgacagaa 
ttgtataggc 
ccaacacagg 
tttgttgact 
aagacaaaca 
actagaggaa 
tgtgttttga 
caggaattga 
ctgatatgtg 
ctctctctct 
gtgtgtgtgt 
atttgctgtt 
ggattacagg 
tcttcatttt 
acatctaaga 
tttaaattct 
ggtttgctag 
acagactcct 
ctgcataaaa 
aagacagcta 
ggaaatcact 
ataccaaaaa 
gttccaattt 
ggtgcatgcc 
gggtacaagg 
acaacaataa 
acatagaatg 
ataatccatt 
aacgcagctg 
tatttactta 
ctaccactgg 
cccccctttc 
tcccggtcgt 
tcgggatgga 
gaccccaaac 
aattgcggta 
aagtcattgc 
attcaggata 
tttcagctaa 
gcaaaaagta 
taaattaatc 
aattccacgt 
agcttaaaac 
gatggtgatt 
cattttgatg 
gaggcccagg 
tggatgaagt 
cgtgctgttt 
ttggtgcttc 
aactctaagg 
gctctatttt 
accacatttt 
gtgtgatttg 
cacagttatt 
ttttttggtt 
tagccctggc 



8940 

9000 

9060 

9120 

9180 

9240 

9300 

9360 

9420 

9480 

9540 

9600 

9660 

9720 

9780 

9840 

9900 

9960 

10020 

10080 

10140 

10200 

10260 

10320 

10380 

10440 

10500 

10560 

10620 

10680 

10740 

10800 

10860 

10920 

10980 

11040 

11100 

11160 

11220 

11280 

11340 

11400 

11460 

11520 

11580 

11640 

11700 

11760 

11820 

11880 

11940 

12000 

12060 

12120 

12180 

12240 

12300 

12360 

12420 

12480 

12540 
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tgtcctggaa 
gcctcccgag 
taaaacacca 
agattgtagg 
aaggctagct 
aagaaagttg 
cactggaaac 
tgggctactg 
ccaaaaggat 
atgccagcgc 
atagagttga 
tactgaagac 
ccagaagttg 
aaatttaaga 
agtctcttat 
ctccccctcc 
tgtctatgtg 
agattttgtc 
catgggctct 
taacctattg 
tgcaggtgtc 
tgaccagatg 
gagggttgaa 
attactatgt 
ggttggatat 
tgaactggct 
ccaccactgc 
acacttcatt 
ccaggaaggg 
gtatgtggga 
aattcacacg 
ccaggaagac 
tagatatgca 
gggtacaatg 
ggcaggagag 
cctactactg 
gtgtttccct 
tttgagttat 
ttatgggtta 
tctggctgtc 
tctgcgagtt 
ctggtatttt 
ctatctggac 
tggtatgtat 
tttccactgt 
cctctagctg 
aaagggtata 
gctccagaat 
accgtagttg 
acagccagga 
catctgagaa 
attaatgggg 
acctgtctac 
caaggccttg 
tactagacaa 
tgggtatgct 
gaactgcatt 
aaccaggtag 
aatcagccac 
actctctgcc 
ttggtaaatt 



ctcactttgt 
tgctgggatt 
tgatttttag 
tcataagcca 
cttattactg 
agcctggtgt 
ctctggagtt 
gaaaccccga 
ggagagagtg 
cgtgagggcg 
tattcagtgt 
tagctgtttt 
gttttcttgg 
atgttaaaga 
tttactctct 
ccctcccact 
tgtctctggg 
tccctagcag 
tccccatgca 
tgagttcata 
tactacctca 
tcgagtctct 
gaaatgtgtt 
ccttttaaca 
ggattttggg 
tttaacccaa 
actagtggca 
actttttcct 
cgaatcttct 
ccatcagaaa 
taaatgggtg 
acagaacgaa 
tatttaaatt 
caggcagtgt 
gggacagtgt 
cagaagcatc 
agacatcata 
tgatcagtgg 
gcctccataa 
ttgctgccct 
ctgttcagtg 
acatttaatg 
caaggaagcc 
aatagtggtg 
cacagtgggt 
gtctctcctc 
gatgctaggg 
gtttatctta 
tgataatatg 
attctataca 
acttgaaaaa 
atttgatcat 
agtctgttgg 
gggacactta 
tttaagtggg 
gtagccaggg 
cacttcagct 
gaaagctaaa 
ctccagtaag 
tgtttcagtg 
ggaatgcaag 



agaccaggct 
aaaggcgtgc 
cagtggttac 
atgtacagat 
tgctgttcag 
ggtgactcag 
gaaggccatc 
ctaaaacaag 
aacataatcc 
gggcttcctt 
ttttcaatag 
tgaagataaa 
ctcttgtatc 
catgggggag 
cctatgccct 
tcatgtactt 
aacataggga 
ccatcagttg 
tgctgggagt 
tgtgccatgg 
caatcccccc 
gccttaatct 
aatttatagg 
gaataatagt. 
tccagtgaac 
tcactaaata 
ccaccgctat 
tggcagcatg 
atgtcactga 
taggagtatt 
gaagctgaaa 
gcatgttctc 
aggctatagt 
ggaggaggaa 
ctgagcacgg 
ctaaactatg 
gactaccaag 
agtctcaaag 
cttgatggtt 
ggcaggagtg 
gctcacttcg 
gagctgaatg 
ctgagttaaa 
tttctgagtg 
atctaacctg 
cctggactgc 
tgtaatctct 
aaaacaaaat 
ccccaaaatt 
gcagtggctt 
aagattcctg 
tacatcgttt 
cttcaggctt 
agtattgctg 
actgtggact 
agggcagagt 
aaggtggaaa 
tagccacggc 
gaaaggaatc 
accaaccatg 
atgtatgtta 



ggcctcaaac 
gccaccaccg 
caatgaatac 
tttgtgaaga 
gtcatttgtc 
aataaatctc 
cttggctaca 
aacaaccaca 
atcccaacag 
tgtgtgaact 
aaaaaaatac 
taggaagtga 
aatctgataa 
aaggcaggta 
ctccaccaat 
ccttaaaact 
gcctctatgg 
ctcacagctc 
ttggttgatt 
ccctgttgtg 
cgccccactc 
ccatctactg 
tattaaaaaa 
attactttct 
tatagcaggc 
caactattta 
cacagttcac 
cactgagcct 
cagcttaatt 
attcagctgt 
atattcattt 
tctcatgcat 
catagaagtt 
agggaattac 
atatacaata 
tacatccata 
taaaaaagta 
gcctccccaa 
gaagaaactt 
tacattcacg 
gctcaaatgc 
ttcaggacct 
ttcaactatt 
agcctcaaat 
ttagaaataa 
aattcacata 
gatcaaggga 
aaaacaaagc 
gaaaatcaat 
tggttctcag 
ccctgggtac 
gcattgcagt 
aagtagactc 
gcatgcattt 
caccacctac 
cctttagttc 
cctgacaggg 
actacaatgt 
gcactaagca 
aaatctttct 
gaatgtgaga 



tcagaaatct 
cccagcggtt 
ggaaatgttc 
ccagctagaa 
tactgtgcgc 
agtattcagg 
tagctatttc 
accaccacca 
atatccacca 
tttgtatcca 
ataggtttgt 
aataaaacca 
gtaatcttta 
aaaattcaca 
ttctcttcta 
cgaccaggtc 
gtctcatctc 
cacagctagg 
ttgtacaggc 
tctggcaaat 
cccttgtagt 
caaaaggagg 
gaaaccttag 
tctctaggac 
ctgagtttca 
atgaccccct 
agggtgccca 
tccagcacta 
ttccacatcc 
aaagaaaaag 
cgagtgagat 
ggacgccagc 
atcaagttag 
cagcacaggt 
ctgaagacct 
tttgtaaaag 
ccagatatgg 
atttcagact 
tttaatcaaa 
aagatgcgct 
aggaatgcat 
cttcccatcg 
aatattcatc 
ctgcctaaga 
acctctagct 
attttacaga 
ctcagaatct 
aaacaacaac 
gataagttag 
ctttggcaat 
tgctcctcca 
gaggctgatg 
atttctaggg 
tcagctagca 
ctcccactgt 
tgtattcctg 
cacataagta 
caagagccat 
cagacatcga 
ttagatgtga 
gcccagaggg 



gcctgcctct 
cctgcattat 
tgcaaggaag 
ggaagaaaac 
tgttgctcaa 
aagtagagac 
aaggccatct 
taaccaccac 
acactctagc 
accgtctaac 
tagctaaaag 
tctaagaaag 
tccctctata 
gtgtattttc 
tttcctcccc 
cacttggtgc 
tggagaaaac 

ggtggggctc 

cttgcacgtg 
tctgtttcat 
ctctgaacat 
gttctatgat 
gggccagttt 
cttatgagct 
tttttttggt 
aacacttgag 
gctgggtaaa 
tgaaagttaa 
tatgacttaa 
gaaattatga 
ccctccaaac 
tttgaagctt 
taaggggagg 
tagatgggag 
ttgaaaaaag 
gagctaaatg 
ggtacctctt 
gtggtcatta 
atgaaaaatg 
ccaacagtgc 
gctgtcagcg 
aagccatcat 
gtacttcctc 
ggttgtttga 
ggtccatagt 
tctttttttc 
ctggtgtaag 
aacaacagaa 
aaacggcaag 
acaacgggtt 
ggaagattca 
gtagacccac 
tataaggggt 
tggtgtagac 
agaagaggtc 
ggcctcagtt 
cctcaaagtc 
tttcttcaag 
aagtaatgcg 
gtgaagaact 
tatgcgtagg 



12600 
12660 
12720 
12780 
12840 
12900 
12960 
13020 
13080 
13140 
13200 
13260 
13320 
13380 
13440 
13500 
13560 
13620 
13680 
13740 
13800 
13860 
13920 
13980 
14040 
14100 
14160 
14220 
14280 
14340 
14400 
14460 
14520 
14580 
14640 
14700 
14760 
14820 
14880 
14940 
15000 
15060 
15120 
15180 
15240 
15300 
15360 
15420 
15480 
15540 
15600 
15660 
15720 
15780 
15840 
15900 
15960 
16020 
16080 
16140 
16200 
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atacagtatc 
ggtggcgcac 
aaggccagcc 
gtctctaaaa 
acaaacaaaa 
tgaggtggct 
ttgctccctg 
taatcttcac 
gacggtaggt 
gtaggtaggt 
tttttgccaa 
attaagctag 
gggatccacg 
gttcaggtct 
tttgggtaaa 
cctatagcat 
gagaccttta 
aattataacc 
gttgtgactg 
tagatgtggc 
gatggaattc 
ttcagtccct 
tttaagaaat 
acagaccaaa 
agatggtttc 
tacagtattt 
agaattattt 
ttgattgtgt 
ccgagatctg 
tagcaacttt 
aacatatgtg 
ttttagcaca 
ttatctgaca 
atattttctt 
cccattccat 
tgccccaccc 
tcttatgagt 
angaaggaac 
cttnttaaaa 
taaaaaaaat 
cctgggaaan 
nnnnnnnnnn 
agnctttttt 
atacatctag 
aaagcttaca 
aaccatttta 
gttatctaca 
tctgattatt 
tttttttgat 
ttaaaataaa 
taaataactg 
cagccatcag 
tgttcaaagg 
cttctcttag 
agaaagagga 
tgggattaaa 
gacactcctt 
ttctctggcc 
aagctccaag 
gataactcgc 
taacttcctt 



aaaccaaagc 
acctttaatc 
tggtctacaa 
aacaaacaaa 
aaccagaaca 
tagcaggtag 
ggacccatat 
atgtgcactg 
aggtaggtag 
aggtaggtag 
aaataagaat 
aatttctttt 
ttgaaaagga 
cttccaaagt 
ttgttctttg 
agcaatcatg 
ttacgaaatc 
tgatttccta 
gctctgtttt 
catctgggtt 
cgtgtctcgc 
gcctcgttgg 
cacttaaatg 
gtgtgatctc 
aggatttttt 
agtatttcct 
tatatctgtt 
agccctggct 
tcaaccttta 
attattttta 
aaattgaaca 
tatatataac 
ggaatataac 
tgtttatact 
ccccctgctt 
agcattcccc 
tggatatgta 
caatttttta 
tattncccaa 
aacccgggtn 
ccnngggang 
nnnnnnnnnn 
tcaanaagaa 
agtctgcttt 
gtttntacgt 
catccatgtc 
gaattttatt 
cctgaactca 
acagtggaaa 
acagtaacaa 
ggccataaat 
tggatcccat 
tgcccccatc 
tctcggttac 
atgatgctcc 
gttctcagcg 
ttgagttgct 
tacctactgt 
agttctcaag 
tgctcataac 
caagtccagc 



agagcaaaaa 
ccagcacttg 
agtgagttcc 
caaacaaccc 
aaatgtcctc 
aaggcactct 
ggtagaagga 
tggcacgtgt 
gtaggtaggt 
gtaggtaggt 
ctatttaaag 
tctcatatag 
ctttctttaa 
tgggaagtct 
gttctaagta 
gaaatgcctc 
atcaggtact 
tagtaaatgc 
gcagtggaag 
ggcacctgtt 
atcatcattg 
ctttttaaat 
aagtgctcag 
acacttaaaa 
tttcattatt 
tctataacat 
aaataaaatg 
gtcttgaaat 
tttcccatgt 
aattgaagat 
taaatctaaa 
atggttctgt 
agtattttag 
ttaaatgttt 
ctatgagggt 
ttaaatgcca 
taaacaattt 
agccaatttg 
atttaaangg 
ttaacagggn 
gggaaaaccc 
nnnnnnnnnn 
agggaaatgg 
ttgtttaatt 
tacctagcac 
attttgtaag 
ttctattgaa 
aacatactct 
taaaacaaac 
caacaacaac 
ggatatatac 
tctggttcac 
ctgctctgcc 
ttttcaagct 
ttcagagctt 
gcatcaacct 
cctgttgaac 
gttctctttt 
ttctggtctc 
aggccacctc 
ccaccgccct 



gcagaacaga 
ggaggcagag 
aggacagcca 
ccccccaata 
tttaatataa 
cccaccccct 
gagaaccaac 
gtatccacac 
aggtaggtag 
aggtgcatac 
tggattcaca 
aagtcctatc 
aaagacttgc 
taaaagttgg 
tctagaacac 
caaaaatgtc 
agaattttta 
aaaattagtg 
gactctgtga 
cagattaccc 
aagaactacc 
caagcccttg 
ctgaatgaaa 
atctagtatt 
attttttaag 
ataaaggtta 
cttctaaaac 
ctgcactgta 
gctgagattg 
tttgccactg 
ttttaactat 
atcttgagat 
gaatttttgt 
tcccctttcc 
gctcccccca 
aaaacagtct 
gcaaattttg 
ggaaagctcn 
cctttgggan 
ncaacnaaat 
cnnnnnnnnn 
nnnnnnnnnn 
aagctaccaa 
cttcaggaat 
gttttcagag 
tgcttgactt 
aacagtgcct 
ttagttggtt 
aaacaaacaa 
acacacacac 
tcaagtagaa 
acacctgctt 
tttccttttg 
ccagttcttc 
gccctctaac 
tcaccctact 
attcctgctt 
gaccattaaa 
agtctttcag 
tgctaccacc 
tctaaaatat 



aaacagaaca 
gcaggcagat 
gggctataca 
aacaaaaaac 
tcatcctatt 
cagagtctga 
ccttgtaatc 
ctacatatac 
gtaggtaggt 
atacatacat 
gattgaaggg 
ttggatttga 
atatcttgat 
aagtgtaggc 
gttttggttt 
tatgtatcaa 
attagctttg 
gttttgcttt 
agggattggt 
caaaggagac 
gaaataaagc 
agtggttcat 
aagcaaagtt 
aaccatttta 
cctatatatt 
tgtctttgta 
ctaagtattt 
gaccaggctg 
aaggtgtgca 
aagttgaatt 
tcatttagca 
atatgaagac 
ctcctttttt 
aggtctcccc 
cccacccact 
tttaaaagaa 
agaatttctt 
tggnaaaatt 
ttantttcaa 
tttaaattta 
nnnnnnnnnn 
nnnnnnnnnn 
taggtatgta 
tcaagacttt 
agcagcacaa 
tgctgccaaa 
gtatcccaag 
tcttttcttt 
taacaacaca 
atcccttaag 
tagcctctcc 
gtgggttggc 
ccaacctccc 
acttttatct 
ttccggttct 
gaaatagtta 
ctcctaagat 
acctcaacat 
cttgctcttc 
cgccatactt 
tgccccggtt 



agccaggtgt 
ttctgagttc 
gagaaaccct 
aaacaaacaa 
aaagggtcag 
ggacctgtgt 
agctgtcctc 
acactagata 
aggtagatag 
agatgtaatt 
atggtatgag 
atagcatcta 
ccaggctctg 
ttcccatatg 
gagagcactg 
ctaaaaaaat 
cattaaacag 
ttgttcttgg 
gctggattgg 
gcttctactg 
ctctgacggt 
ttaaatgaca 
aaatatgttt 
tttcagccaa 
ggaatgccat 
aggactgtat 
gtttattcgt 
gcctcaaatt 
ccaccatgcc 
cctagtactt 
atttatgaaa 
ataaggatgt 
tttttattgg 
ttcagaaacc 
caaatcttcc 
cagcattttt 
agttctaagg 
actttaatcc 
aacaattttt 
ntcntggggc 
nnnnnnnnnn 
nnnnncagtc 
tttattagag 
aagtaccctc 
aggttaaaca 
gtgaatacat 
gagaattagc 
cttttttttt 
catgcatacc 
gtcaaaaact 
caaggcatgc 
tggcccttta 
caacttgcac 
gccaactcca 
gtaattcaaa 
caggcttgat 
gtagacacag 
gtttatagta 
caattatact 
ctctccaaaa 
ccttataagc 
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